Here's a zipped-up tree from a staged sample of the website:
https://drive.google.com/file/d/1LKL936tBJ79jpjvlL5vC5uYYwTHsWXiJ/view?usp=sharing

I'd also suggest tagging the commit, so we can find the fist commit later
on for reference. I can push the tag after the PR is merged.

On Thu, May 14, 2020 at 10:43 AM Ahmet Altay <al...@google.com> wrote:

>
>
> On Thu, May 14, 2020 at 9:16 AM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> Thank you all for reviewing and validating this pull request. I see that
>> all tests are passing now, should we merge it?
>>
>
> +1 to merging now.
>
> Before the merge, please share a link to an archive copy of the old
> website. After the merge, please try out the live website see if it is
> working as expected.
>
>
>>
>> On Wed, May 13, 2020, 5:41 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> Thank you! Let's merge it once tests are done.
>>>
>>> On Wed, May 13, 2020 at 5:23 PM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> I took a (non-comprehensive) look at these as well, and didn't see any
>>>> issues, so am happy to sign off on this. Thanks Nam, Brian, Ahmet, and
>>>> everyone else.
>>>>
>>>> On Wed, May 13, 2020 at 7:58 AM Nam Bui <nam....@polidea.com> wrote:
>>>>
>>>>> Hi Ahmet,
>>>>> "Does this mean the internal links (e.g. contribute/team) will
>>>>> disappear?"
>>>>> Yes, I'd like to get rid of them. And to make sure it won't appear to
>>>>> confuse people, I replaced all of the spots using "contribute/team" with
>>>>> the external one. Currently, we only have 2 "redirect_to" links which are
>>>>> "contribute/team" & "contribute/project/team", so this act won't have any
>>>>> affects.
>>>>> Also, based on your question, I just added a section in the
>>>>> documentation (CONTRIBUTE.md), which mentions the replaced/removed 
>>>>> features
>>>>> of Jekyll in terms of writing a new blog post or documentation in Hugo.
>>>>>
>>>>
>>> Got it. The main effect will be any one has a bookmark/link to these
>>> pages, those links will no longer work. It is fine if it is only limited to
>>> these 2 urls.
>>>
>>>
>>>>
>>>>>
>>>>> On Wed, May 13, 2020 at 4:17 AM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> - I reviewed the diff output with Nam's explanations. The change
>>>>>> looks minimal. Large diffs are primarily coming from index and redirect
>>>>>> files. codeblocks have differences but the content is seemingly 
>>>>>> preserved.
>>>>>> IIUC, the source of truth is snippet files anyway. (It would be good to 
>>>>>> get
>>>>>> one more set of eyes on this.)
>>>>>> - Brian and I reviewed the infrastructure changes. They look
>>>>>> reasonable.
>>>>>>
>>>>>> I think PR is very close to a mergeable state. Especially if we can
>>>>>> get an archive copy of the current website, I will be comfortable with 
>>>>>> the
>>>>>> merge.
>>>>>>
>>>>>> And, thank you Nam for your work so far.
>>>>>>
>>>>>> On Tue, May 12, 2020 at 4:13 PM Nam Bui <nam....@polidea.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> A new commit covers Robert's script is pushed [1], and also the
>>>>>>> script output is attached in this email.
>>>>>>>
>>>>>>> Based on the diff output of the script, my strategy is looking at
>>>>>>> the sections which contain the large/massive removed texts, to make sure
>>>>>>> that there are no lost content or files. And below are all of the links
>>>>>>> which have large of the removed content.
>>>>>>>
>>>>>>> - Detection:
>>>>>>> These links lost some of the contents. Fixed!
>>>>>>> + documentation/runners/jstorm/index.html
>>>>>>> + documentation/dsls/sql/calcite/lexical-structure/index.html
>>>>>>> + documentation/dsls/sql/zetasql/data-types/index.html
>>>>>>> + documentation/dsls/sql/zetasql/query-syntax/index.html
>>>>>>>
>>>>>>> - Aliases:
>>>>>>> These links are redirected links. So in Hugo, these HTML files only
>>>>>>> include redirected URLs. I also took a look at them to ensure the 
>>>>>>> content
>>>>>>> was there.
>>>>>>> + documentation/dsls/sql/calcite/lexical/index.html
>>>>>>> + old URLs of blog posts
>>>>>>>
>>>>>>> - Ignore:
>>>>>>> Hugo and Jekyll have different structures of code highlighters
>>>>>>> rendering in HTML. Ahmed & Pablo agree with me that its fair to ignore 
>>>>>>> them
>>>>>>> for now.
>>>>>>> + codeblocks
>>>>>>>
>>>>>>> - Missing files:
>>>>>>> The script returns some of “missing files” status
>>>>>>> + coming-soon.html (this file was used nowhere in Jekyll, so I
>>>>>>> didn’t migrate to Hugo)
>>>>>>> + documentation/dsls/sql/statements/select/index.html (aliases)
>>>>>>> + blog/2019/04/25/beam-2.12.0.html (fixed!)
>>>>>>> + blog/2020/05/08/beam-summit-digital-2020.html (new blog post,
>>>>>>> added!)
>>>>>>> + v2/index.html (this file was used nowhere in Jekyll, so I didn’t
>>>>>>> migrate to Hugo)
>>>>>>> + contribute/team/index.html (mentioned in “redirect_to” below)
>>>>>>> + contribute/project/team/index.html (mentioned in “redirect_to”
>>>>>>> below)
>>>>>>>
>>>>>>> - “redirect_to”:
>>>>>>> In Jekyll, there is a feature called “redirect_to”. For instance,
>>>>>>> you click on an internal link “contribute/team/” to reach the markdown
>>>>>>> “team.md”, then from the markdown file, it redirects you to the external
>>>>>>> URL “https://example.com”.
>>>>>>> However, there is no such feature in Hugo. My solution is to
>>>>>>> directly replace “contribute/team/” with “https://example.com”.
>>>>>>>
>>>>>>
>>>>>> Does this mean the internal links (e.g. contribute/team) will
>>>>>> disappear?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> [1] https://github.com/apache/beam/pull/11554
>>>>>>>
>>>>>>> On Mon, May 11, 2020 at 7:34 PM Nam Bui <nam....@polidea.com> wrote:
>>>>>>>
>>>>>>>> Updates for today:
>>>>>>>> - Thanks Brian & Ahmet for your reviews. I left my comments for
>>>>>>>> some of the questions and also adapted new changes to the reviews [1].
>>>>>>>> - I see that the new blog post was merged yesterday, so I added it
>>>>>>>> to the PR as well.
>>>>>>>>
>>>>>>>> I briefly tried the script from Robert with the input of build
>>>>>>>> files from old and new websites. It seemed to work well in terms of
>>>>>>>> detecting missing files (or probably wrong links leading to missing 
>>>>>>>> files).
>>>>>>>> I will push another commit to fix all that up, hope can be tomorrow.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/apache/beam/pull/11554#issuecomment-626792031
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Nam
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, May 11, 2020 at 9:01 AM Nam Bui <nam....@polidea.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> @Ahmet: Yeah, it's all clear to me. :)
>>>>>>>>> @Robert: Thanks for your ideas and also the script. It really
>>>>>>>>> helps me to serve my works.
>>>>>>>>>
>>>>>>>>> Best regard!
>>>>>>>>>
>>>>>>>>> On Sat, May 9, 2020 at 2:10 AM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> This sounds reasonable to me. Thank you. Nam, does it make sense
>>>>>>>>>> to you?
>>>>>>>>>>
>>>>>>>>>> On Fri, May 8, 2020 at 11:53 AM Robert Bradshaw <
>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'd really like to not see this work go to waste, both the
>>>>>>>>>>> original revision, the further efforts Nam has done in making it 
>>>>>>>>>>> more
>>>>>>>>>>> manageable to review, and the work put into reviewing this so far, 
>>>>>>>>>>> so we
>>>>>>>>>>> can get the benefits of being on Hugo. How about this for a
>>>>>>>>>>> concrete proposal:
>>>>>>>>>>>
>>>>>>>>>>> (1) We get "standard" approval from one or more committers for
>>>>>>>>>>> the infrastructure changes, just as with any other PR. Brian has
>>>>>>>>>>> already started this, but if others could step up as well that'd be 
>>>>>>>>>>> great.
>>>>>>>>>>>
>>>>>>>>>>> (2) Reviewers (and authors) typically count on (or request)
>>>>>>>>>>> sufficient automated test coverage to augment the fact that their 
>>>>>>>>>>> eyeballs
>>>>>>>>>>> are fallible, which is something that is missing here (and given 
>>>>>>>>>>> the size
>>>>>>>>>>> of the change not easily compensated for by a more detailed manual 
>>>>>>>>>>> review).
>>>>>>>>>>> How about we use the script above (or similar) as an automated test 
>>>>>>>>>>> to
>>>>>>>>>>> validate the website's contents haven't (materially) changed. I 
>>>>>>>>>>> feel we've
>>>>>>>>>>> validated enough that the style looks good via spot checking (which 
>>>>>>>>>>> is
>>>>>>>>>>> something that should work on all pages if it works on one). The 
>>>>>>>>>>> diff
>>>>>>>>>>> between the current site and the newly generated site should be 
>>>>>>>>>>> empty (it
>>>>>>>>>>> might already be [1]), or at least we should get a stamp of 
>>>>>>>>>>> approval on the
>>>>>>>>>>> plain-text diff (which should be small), before merging.
>>>>>>>>>>>
>>>>>>>>>>> (3) To make things easier, everyone holds off on making any
>>>>>>>>>>> changes to the old site until a fixed future date (say, next 
>>>>>>>>>>> Wednesday).
>>>>>>>>>>> Hopefully we can get it merged by then. If not, a condition for 
>>>>>>>>>>> merging
>>>>>>>>>>> would be a commitment incorporating new changes after this date.
>>>>>>>>>>>
>>>>>>>>>>> Does this sound reasonable?
>>>>>>>>>>>
>>>>>>>>>>> - Robert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1] I'd be curious as to how small the diff already is, but my
>>>>>>>>>>> script relies on local directories with the generated HTML, which I 
>>>>>>>>>>> don't
>>>>>>>>>>> have handy at the moment.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 8, 2020 at 10:45 AM Robert Bradshaw <
>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Here's a script that we could run on the old and new sites that
>>>>>>>>>>>> should quickly catch any major issues but not get caught up in 
>>>>>>>>>>>> formatting
>>>>>>>>>>>> minutia.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 8, 2020 at 10:23 AM Robert Bradshaw <
>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 8, 2020 at 9:58 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>> aizha...@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I understand the difficulty, and this certainly comes with
>>>>>>>>>>>>>> lessons learned for future similar projects.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To your questions Robert:
>>>>>>>>>>>>>> (1 and 2) I will commit to review the text in the resulting
>>>>>>>>>>>>>> pages. I will try and use some automation to extract visible 
>>>>>>>>>>>>>> text from each
>>>>>>>>>>>>>> page and diff it with the current state of the website. I can do 
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> starting next week. From some quick research, there seem to be 
>>>>>>>>>>>>>> tools that
>>>>>>>>>>>>>> help with this analysis (
>>>>>>>>>>>>>> https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal
>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> At first glance it looks like these tools would give diffs
>>>>>>>>>>>>> that are *larger* than the 47K one we're struggling to review 
>>>>>>>>>>>>> here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> By remaining in this state, we hold others up from making
>>>>>>>>>>>>>> changes, or we increase the amount of work needed after merging 
>>>>>>>>>>>>>> to port
>>>>>>>>>>>>>> over changes that may be missed. If we move forward, new changes 
>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>> done on top of the new website.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree we don't want to hold others up from making changes.
>>>>>>>>>>>>> However, the amount of work to port changes over seems small in 
>>>>>>>>>>>>> comparison
>>>>>>>>>>>>> to everything else that is being discussed here. (It also 
>>>>>>>>>>>>> provides good
>>>>>>>>>>>>> incentives to reach the bar quickly and has the advantage of 
>>>>>>>>>>>>> falling on the
>>>>>>>>>>>>> right people.) (3) will still take some time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we go this route, we're lowering the bar for doc changes,
>>>>>>>>>>>>> but not removing it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> (3) This makes sense. Brian, would you be able to spend some
>>>>>>>>>>>>>> time to look at the automation changes (build files and scripts) 
>>>>>>>>>>>>>> to ensure
>>>>>>>>>>>>>> they look fine?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would also like to write a post mortem to extract lessons
>>>>>>>>>>>>>> learned and avoid this situation in the future.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 8, 2020 at 9:44 AM Brian Hulette <
>>>>>>>>>>>>>> bhule...@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm -0 on merging as-is. I have the same concerns as Robert
>>>>>>>>>>>>>>> and he's voiced them very well so I won't waste time re-airing 
>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (2) I spot checked the content, pulled out some common
>>>>>>>>>>>>>>>> patterns, and
>>>>>>>>>>>>>>>> it mostly looks good, but there were also some issues (e.g.
>>>>>>>>>>>>>>>> several
>>>>>>>>>>>>>>>> pages were replaced with the contents from entirely
>>>>>>>>>>>>>>>> different pages).
>>>>>>>>>>>>>>>> I would be more comfortable if, say, a smoke test of
>>>>>>>>>>>>>>>> comparing the old
>>>>>>>>>>>>>>>> and new sites, with html tags stripped and ignoring
>>>>>>>>>>>>>>>> whitespace,
>>>>>>>>>>>>>>>> yielded what should be empty diffs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you share any details about this analysis?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> It was basically paging through the diff, adding things to the
>>>>>>>>>>>>> sed script, and then looking at more diffs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 for verifying the old and new are the same by diffing the
>>>>>>>>>>>>>>> output.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (3) It'd be good to have someone give a stamp of approval
>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>> infrastructure changes, at least to validate that we're not
>>>>>>>>>>>>>>>> going to
>>>>>>>>>>>>>>>> be taking on extra tech debt with regard to jenkins
>>>>>>>>>>>>>>>> stability and
>>>>>>>>>>>>>>>> developer workflow. I see that Brian has at least looked at
>>>>>>>>>>>>>>>> this some.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My involvement so far was just recognizing a problem
>>>>>>>>>>>>>>> (creating root-owned files on jenkins workers) and helping to 
>>>>>>>>>>>>>>> fix it. If
>>>>>>>>>>>>>>> there's anyone available who's familiar with the website 
>>>>>>>>>>>>>>> infrastructure it
>>>>>>>>>>>>>>> would be great if they could take a look instead (if not I 
>>>>>>>>>>>>>>> could probably
>>>>>>>>>>>>>>> acquaint myself enough to review).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, May 7, 2020 at 11:57 PM Robert Bradshaw <
>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a tough situation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It would have been much better if this transition was
>>>>>>>>>>>>>>>> structured in
>>>>>>>>>>>>>>>> such a way that the review was more manageable (e.g. the
>>>>>>>>>>>>>>>> suggestion of
>>>>>>>>>>>>>>>> scripts, not mixing in voluminous unnecessary changes like
>>>>>>>>>>>>>>>> whitespace,
>>>>>>>>>>>>>>>> and not updating content), and possibly even incrementally
>>>>>>>>>>>>>>>> (e.g. the
>>>>>>>>>>>>>>>> new site would have been developed over multiple PRs in a
>>>>>>>>>>>>>>>> subdomain or
>>>>>>>>>>>>>>>> subdirectory while being worked on). But hindsight is 20/20
>>>>>>>>>>>>>>>> and no
>>>>>>>>>>>>>>>> one, myself included, thought to bring this up when the
>>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>>> migration was proposed, so this is more something to keep
>>>>>>>>>>>>>>>> in mind for
>>>>>>>>>>>>>>>> the future. I also appreciate the efforts that have been
>>>>>>>>>>>>>>>> made to clean
>>>>>>>>>>>>>>>> things up (e.g. preserving history) and address feedback.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So, where do we go from here? My first thought is that I
>>>>>>>>>>>>>>>> really don't
>>>>>>>>>>>>>>>> want to set a precedent that just because a PR "will
>>>>>>>>>>>>>>>> require a large
>>>>>>>>>>>>>>>> effort" and in a state that if we don't "move forward and
>>>>>>>>>>>>>>>> merge what
>>>>>>>>>>>>>>>> we have now" then "work done so far will be lost" means
>>>>>>>>>>>>>>>> that we think
>>>>>>>>>>>>>>>> it's OK to forgo doing a proper review.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On the other hand, there are some mitigating factors with
>>>>>>>>>>>>>>>> this being
>>>>>>>>>>>>>>>> the website and not the code in that "bugs," though possibly
>>>>>>>>>>>>>>>> embarrassing, won't break production pipelines or data
>>>>>>>>>>>>>>>> loss, and
>>>>>>>>>>>>>>>> though the source is technically part of the release, when
>>>>>>>>>>>>>>>> we find
>>>>>>>>>>>>>>>> something to fix we can fix the live website much more
>>>>>>>>>>>>>>>> quickly than go
>>>>>>>>>>>>>>>> through the whole release process and convince people to
>>>>>>>>>>>>>>>> upgrade. (I
>>>>>>>>>>>>>>>> recognize accepting this argument is, to some degree at
>>>>>>>>>>>>>>>> least, saying
>>>>>>>>>>>>>>>> that we don't care about the correctness of docs as much as
>>>>>>>>>>>>>>>> so-called
>>>>>>>>>>>>>>>> "real" code, if we go there.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If we decide to go ahead and merge (and I would not
>>>>>>>>>>>>>>>> object), there are
>>>>>>>>>>>>>>>> some things I would like to see.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (1) I would like to understand what we would do afterwards
>>>>>>>>>>>>>>>> to "review
>>>>>>>>>>>>>>>> the outcome, and ensure that all the content is there," and
>>>>>>>>>>>>>>>> why it
>>>>>>>>>>>>>>>> can't be done before merging instead. (Is it because it'd
>>>>>>>>>>>>>>>> take time
>>>>>>>>>>>>>>>> and we don't want to incorporate changes that are made to
>>>>>>>>>>>>>>>> the website
>>>>>>>>>>>>>>>> in the meantime? I think that boat has sailed, but maybe we
>>>>>>>>>>>>>>>> can avoid
>>>>>>>>>>>>>>>> making it worse...)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (2) I spot checked the content, pulled out some common
>>>>>>>>>>>>>>>> patterns, and
>>>>>>>>>>>>>>>> it mostly looks good, but there were also some issues (e.g.
>>>>>>>>>>>>>>>> several
>>>>>>>>>>>>>>>> pages were replaced with the contents from entirely
>>>>>>>>>>>>>>>> different pages).
>>>>>>>>>>>>>>>> I would be more comfortable if, say, a smoke test of
>>>>>>>>>>>>>>>> comparing the old
>>>>>>>>>>>>>>>> and new sites, with html tags stripped and ignoring
>>>>>>>>>>>>>>>> whitespace,
>>>>>>>>>>>>>>>> yielded what should be empty diffs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (3) It'd be good to have someone give a stamp of approval
>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>> infrastructure changes, at least to validate that we're not
>>>>>>>>>>>>>>>> going to
>>>>>>>>>>>>>>>> be taking on extra tech debt with regard to jenkins
>>>>>>>>>>>>>>>> stability and
>>>>>>>>>>>>>>>> developer workflow. I see that Brian has at least looked at
>>>>>>>>>>>>>>>> this some.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Robert
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, May 7, 2020 at 12:40 PM Aizhamal Nurmamat kyzy
>>>>>>>>>>>>>>>> <aizha...@apache.org> wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Thank you Ahmet.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Robert/Brian, what do you think?
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > The website staging and pre commit tests have passed [1].
>>>>>>>>>>>>>>>> If nobody has objections, we could merge it soon.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > [1] https://github.com/apache/beam/pull/11554
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Thu, May 7, 2020 at 11:38 AM Ahmet Altay <
>>>>>>>>>>>>>>>> al...@google.com> wrote:
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> On Thu, May 7, 2020 at 10:50 AM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>>> aizha...@apache.org> wrote:
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Thanks for the writeup Ahmet.
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> My bias is to move forward and merge the PR. After
>>>>>>>>>>>>>>>> this, we'll review the outcome, and ensure that all the 
>>>>>>>>>>>>>>>> content is there.
>>>>>>>>>>>>>>>> Nam will help us with that.
>>>>>>>>>>>>>>>> >>> The reason that I'd like to move forward and merge what
>>>>>>>>>>>>>>>> we have now - is that if we don't do that, the work done so 
>>>>>>>>>>>>>>>> far will be
>>>>>>>>>>>>>>>> lost.
>>>>>>>>>>>>>>>> >>> We'll make sure to stage the website in its current
>>>>>>>>>>>>>>>> state, and use that as reference/archive to ensure all the 
>>>>>>>>>>>>>>>> content have
>>>>>>>>>>>>>>>> been moved.
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Is this reasonable to everyone?
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> This is reasonable to me. I agree with your reasons.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> What do others think?
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> On Wed, May 6, 2020 at 7:07 PM Ahmet Altay <
>>>>>>>>>>>>>>>> al...@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> On Wed, May 6, 2020 at 2:33 PM Aizhamal Nurmamat kyzy <
>>>>>>>>>>>>>>>> aizha...@apache.org> wrote:
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> > 1) Currently, the main blocker for merging is
>>>>>>>>>>>>>>>> Staging Test Failures.
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> That and finishing the review. (Is someone
>>>>>>>>>>>>>>>> tracking/coordinating this?)
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> I am coordinating the work on the failed tests, but I
>>>>>>>>>>>>>>>> would need other committer's help to perform the review. 
>>>>>>>>>>>>>>>> @Ahmet, could you
>>>>>>>>>>>>>>>> help us prioritize the review for this PR?
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> The problem is there are too many manual changes.
>>>>>>>>>>>>>>>> Reviewing this change in this form will require a large 
>>>>>>>>>>>>>>>> effort. I do not
>>>>>>>>>>>>>>>> think I can interrupt other projects to prioritize reviews on 
>>>>>>>>>>>>>>>> this PR. IMO,
>>>>>>>>>>>>>>>> we have a few options:
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> - PR to be restructured in the format suggested in
>>>>>>>>>>>>>>>> this thread. A commit for infrastructure changes from Jekyll 
>>>>>>>>>>>>>>>> to hugo. A
>>>>>>>>>>>>>>>> second commit for a script that will convert the majority of 
>>>>>>>>>>>>>>>> the content. A
>>>>>>>>>>>>>>>> third commit for the execution of the script. And a fourth 
>>>>>>>>>>>>>>>> commit for the
>>>>>>>>>>>>>>>> additional manual content changes. If Nam can get to this 
>>>>>>>>>>>>>>>> form, people on
>>>>>>>>>>>>>>>> this thread myself/Robert/Pablo/Brian can review the changes.
>>>>>>>>>>>>>>>> >>>> - Another option is, we can accept that we already
>>>>>>>>>>>>>>>> invested in this transition and overall this is a good change, 
>>>>>>>>>>>>>>>> and merge
>>>>>>>>>>>>>>>> the PR more or less in its current form (with tests fixed and 
>>>>>>>>>>>>>>>> open comments
>>>>>>>>>>>>>>>> addressed) even though it has issues. And then overtime fix 
>>>>>>>>>>>>>>>> the issues we
>>>>>>>>>>>>>>>> encounter. There was already some amount of review and visual 
>>>>>>>>>>>>>>>> comparisons,
>>>>>>>>>>>>>>>> we risk losing some recent content changes but I am assuming 
>>>>>>>>>>>>>>>> this will not
>>>>>>>>>>>>>>>> be much. If Nam can commit to compare two sites after a merge, 
>>>>>>>>>>>>>>>> fixing the
>>>>>>>>>>>>>>>> majority of the delta, this might be a viable option.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> Another thing we can do, we can archive/store a
>>>>>>>>>>>>>>>> read-only copy of the current website in an "archive" url 
>>>>>>>>>>>>>>>> temporarily
>>>>>>>>>>>>>>>> instead of completely deleting it. It will give us a baseline 
>>>>>>>>>>>>>>>> for a while
>>>>>>>>>>>>>>>> to go back to the old content and move any missing data. (And 
>>>>>>>>>>>>>>>> maybe,
>>>>>>>>>>>>>>>> someone can come up with an innovative way to compare the 
>>>>>>>>>>>>>>>> textual content
>>>>>>>>>>>>>>>> of both sites.) A note on the stop world approach, I believe 
>>>>>>>>>>>>>>>> we are already
>>>>>>>>>>>>>>>> failing on that with merge conflicts showing up on the PR. It 
>>>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>>>> better for us to complete the transition as soon as possible. 
>>>>>>>>>>>>>>>> Fixing after
>>>>>>>>>>>>>>>> the initial merge might be a simpler task, especially if we 
>>>>>>>>>>>>>>>> can archive the
>>>>>>>>>>>>>>>> old site.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> > Michal showed Nam how to handle the 1st test which
>>>>>>>>>>>>>>>> was about Apache License missing.
>>>>>>>>>>>>>>>> >>>>>> >
>>>>>>>>>>>>>>>> >>>>>> > However, the 2nd and 3rd tests looked like some
>>>>>>>>>>>>>>>> kind of permissions error on the Jenkins worker, not to be 
>>>>>>>>>>>>>>>> configured by
>>>>>>>>>>>>>>>> code. For more details based on Jenkin logs, the 2nd test 
>>>>>>>>>>>>>>>> failed because of
>>>>>>>>>>>>>>>> website/www/site/themes and the 3rd test failed because of
>>>>>>>>>>>>>>>> website/www/node_modules, they are both auto-generated files 
>>>>>>>>>>>>>>>> on build. Can
>>>>>>>>>>>>>>>> someone help Nam to look into this?
>>>>>>>>>>>>>>>> >>>>>> >
>>>>>>>>>>>>>>>> >>>>>> > RAT ("Run RAT PreCommit") — FAILURE
>>>>>>>>>>>>>>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS
>>>>>>>>>>>>>>>> PreCommit") — FAILURE
>>>>>>>>>>>>>>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS
>>>>>>>>>>>>>>>> PreCommit") — FAILURE
>>>>>>>>>>>>>>>> >>>>>> >
>>>>>>>>>>>>>>>> >>>>>> > 2) Are there any other blockers for merging?
>>>>>>>>>>>>>>>> @Ahmet/Robert/others please share if there are any other 
>>>>>>>>>>>>>>>> blockers.
>>>>>>>>>>>>>>>> >>>>>> >
>>>>>>>>>>>>>>>> >>>>>> >
>>>>>>>>>>>>>>>> >>>>>> > [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>>>>>>>>>>>> >>>>>> >
>>>>>>>>>>>>>>>> >>>>>> >
>>>>>>>>>>>>>>>> >>>>>> > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw <
>>>>>>>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>>> >>
>>>>>>>>>>>>>>>> >>>>>> >> On Mon, May 4, 2020 at 7:07 PM Ahmet Altay <
>>>>>>>>>>>>>>>> al...@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>>> >> >
>>>>>>>>>>>>>>>> >>>>>> >> >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw
>>>>>>>>>>>>>>>> <rober...@google.com> wrote:
>>>>>>>>>>>>>>>> >>>>>> >> >>>
>>>>>>>>>>>>>>>> >>>>>> >> >>> I took the massive commit and split it up
>>>>>>>>>>>>>>>> into:
>>>>>>>>>>>>>>>> >>>>>> >> >>>
>>>>>>>>>>>>>>>> >>>>>> >> >>> (1) Infrastructure changes (basically
>>>>>>>>>>>>>>>> everything outside of
>>>>>>>>>>>>>>>> >>>>>> >> >>> (website/www/site/content)
>>>>>>>>>>>>>>>> >>>>>> >> >>> (2) Sed script changes, and
>>>>>>>>>>>>>>>> >>>>>> >> >>> (3) Manual changes (everything not in (1) and
>>>>>>>>>>>>>>>> (2)).
>>>>>>>>>>>>>>>> >>>>>> >> >
>>>>>>>>>>>>>>>> >>>>>> >> >
>>>>>>>>>>>>>>>> >>>>>> >> > Thank you Robert. This makes it much easier.
>>>>>>>>>>>>>>>> What is the source of the sed script? I am not sure why some 
>>>>>>>>>>>>>>>> of those lines
>>>>>>>>>>>>>>>> are there. It would be much easier for us to comment on the 
>>>>>>>>>>>>>>>> script source
>>>>>>>>>>>>>>>> if it is reviewable somewhere.
>>>>>>>>>>>>>>>> >>>>>> >>
>>>>>>>>>>>>>>>> >>>>>> >> I just gathered up common patterns as I was
>>>>>>>>>>>>>>>> trying to go through and
>>>>>>>>>>>>>>>> >>>>>> >> review the files... Mostly it was an exercise in
>>>>>>>>>>>>>>>> finding a compact
>>>>>>>>>>>>>>>> >>>>>> >> representation for the delta, not trying to be a
>>>>>>>>>>>>>>>> perfect conversion.
>>>>>>>>>>>>>>>> >>>>>> >> (I do think in retrospect, if we do something
>>>>>>>>>>>>>>>> like this again, it
>>>>>>>>>>>>>>>> >>>>>> >> would be preferable to commit a script that does
>>>>>>>>>>>>>>>> the auto-conversion
>>>>>>>>>>>>>>>> >>>>>> >> (maybe even with some patch files for manual
>>>>>>>>>>>>>>>> changes) both for ease of
>>>>>>>>>>>>>>>> >>>>>> >> reviewing and to avoid the stop-the-world
>>>>>>>>>>>>>>>> situation we're in now. (I'm
>>>>>>>>>>>>>>>> >>>>>> >> still worried that some changes will get lost in
>>>>>>>>>>>>>>>> the shuffle.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Reply via email to