This sounds reasonable to me. Thank you. Nam, does it make sense to you?

On Fri, May 8, 2020 at 11:53 AM Robert Bradshaw <rober...@google.com> wrote:

> I'd really like to not see this work go to waste, both the original
> revision, the further efforts Nam has done in making it more manageable to
> review, and the work put into reviewing this so far, so we can get the
> benefits of being on Hugo. How about this for a concrete proposal:
>
> (1) We get "standard" approval from one or more committers for the
> infrastructure changes, just as with any other PR. Brian has
> already started this, but if others could step up as well that'd be great.
>
> (2) Reviewers (and authors) typically count on (or request) sufficient
> automated test coverage to augment the fact that their eyeballs are
> fallible, which is something that is missing here (and given the size of
> the change not easily compensated for by a more detailed manual review).
> How about we use the script above (or similar) as an automated test to
> validate the website's contents haven't (materially) changed. I feel we've
> validated enough that the style looks good via spot checking (which is
> something that should work on all pages if it works on one). The diff
> between the current site and the newly generated site should be empty (it
> might already be [1]), or at least we should get a stamp of approval on the
> plain-text diff (which should be small), before merging.
>
> (3) To make things easier, everyone holds off on making any changes to the
> old site until a fixed future date (say, next Wednesday). Hopefully we can
> get it merged by then. If not, a condition for merging would be a
> commitment incorporating new changes after this date.
>
> Does this sound reasonable?
>
> - Robert
>
>
>
> [1] I'd be curious as to how small the diff already is, but my script
> relies on local directories with the generated HTML, which I don't have
> handy at the moment.
>
>
>
> On Fri, May 8, 2020 at 10:45 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> Here's a script that we could run on the old and new sites that should
>> quickly catch any major issues but not get caught up in formatting minutia.
>>
>>
>>
>> On Fri, May 8, 2020 at 10:23 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> On Fri, May 8, 2020 at 9:58 AM Aizhamal Nurmamat kyzy <
>>> aizha...@apache.org> wrote:
>>>
>>>> I understand the difficulty, and this certainly comes with lessons
>>>> learned for future similar projects.
>>>>
>>>> To your questions Robert:
>>>> (1 and 2) I will commit to review the text in the resulting pages. I
>>>> will try and use some automation to extract visible text from each page and
>>>> diff it with the current state of the website. I can do this starting next
>>>> week. From some quick research, there seem to be tools that help with this
>>>> analysis (
>>>> https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal
>>>> )
>>>>
>>>
>>> At first glance it looks like these tools would give diffs that are
>>> *larger* than the 47K one we're struggling to review here.
>>>
>>>
>>>> By remaining in this state, we hold others up from making changes, or
>>>> we increase the amount of work needed after merging to port over changes
>>>> that may be missed. If we move forward, new changes can be done on top of
>>>> the new website.
>>>>
>>>
>>> I agree we don't want to hold others up from making changes. However,
>>> the amount of work to port changes over seems small in comparison to
>>> everything else that is being discussed here. (It also provides good
>>> incentives to reach the bar quickly and has the advantage of falling on the
>>> right people.) (3) will still take some time.
>>>
>>> If we go this route, we're lowering the bar for doc changes, but not
>>> removing it.
>>>
>>>
>>>> (3) This makes sense. Brian, would you be able to spend some time to
>>>> look at the automation changes (build files and scripts) to ensure they
>>>> look fine?
>>>>
>>>> I would also like to write a post mortem to extract lessons learned and
>>>> avoid this situation in the future.
>>>>
>>>>
>>>> On Fri, May 8, 2020 at 9:44 AM Brian Hulette <bhule...@google.com>
>>>> wrote:
>>>>
>>>>> I'm -0 on merging as-is. I have the same concerns as Robert and he's
>>>>> voiced them very well so I won't waste time re-airing them.
>>>>>
>>>>> (2) I spot checked the content, pulled out some common patterns, and
>>>>>> it mostly looks good, but there were also some issues (e.g. several
>>>>>> pages were replaced with the contents from entirely different pages).
>>>>>> I would be more comfortable if, say, a smoke test of comparing the old
>>>>>> and new sites, with html tags stripped and ignoring whitespace,
>>>>>> yielded what should be empty diffs.
>>>>>>
>>>>>
>>>>> Can you share any details about this analysis?
>>>>>
>>>>
>>> It was basically paging through the diff, adding things to the sed
>>> script, and then looking at more diffs.
>>>
>>>
>>>> +1 for verifying the old and new are the same by diffing the output.
>>>>>
>>>>>
>>>>>> (3) It'd be good to have someone give a stamp of approval on the
>>>>>> infrastructure changes, at least to validate that we're not going to
>>>>>> be taking on extra tech debt with regard to jenkins stability and
>>>>>> developer workflow. I see that Brian has at least looked at this some.
>>>>>
>>>>>
>>>>> My involvement so far was just recognizing a problem (creating
>>>>> root-owned files on jenkins workers) and helping to fix it. If there's
>>>>> anyone available who's familiar with the website infrastructure it would 
>>>>> be
>>>>> great if they could take a look instead (if not I could probably acquaint
>>>>> myself enough to review).
>>>>>
>>>>> On Thu, May 7, 2020 at 11:57 PM Robert Bradshaw <rober...@google.com>
>>>>> wrote:
>>>>>
>>>>>> This is a tough situation.
>>>>>>
>>>>>> It would have been much better if this transition was structured in
>>>>>> such a way that the review was more manageable (e.g. the suggestion of
>>>>>> scripts, not mixing in voluminous unnecessary changes like whitespace,
>>>>>> and not updating content), and possibly even incrementally (e.g. the
>>>>>> new site would have been developed over multiple PRs in a subdomain or
>>>>>> subdirectory while being worked on). But hindsight is 20/20 and no
>>>>>> one, myself included, thought to bring this up when the original
>>>>>> migration was proposed, so this is more something to keep in mind for
>>>>>> the future. I also appreciate the efforts that have been made to clean
>>>>>> things up (e.g. preserving history) and address feedback.
>>>>>>
>>>>>> So, where do we go from here? My first thought is that I really don't
>>>>>> want to set a precedent that just because a PR "will require a large
>>>>>> effort" and in a state that if we don't "move forward and merge what
>>>>>> we have now" then "work done so far will be lost" means that we think
>>>>>> it's OK to forgo doing a proper review.
>>>>>>
>>>>>> On the other hand, there are some mitigating factors with this being
>>>>>> the website and not the code in that "bugs," though possibly
>>>>>> embarrassing, won't break production pipelines or data loss, and
>>>>>> though the source is technically part of the release, when we find
>>>>>> something to fix we can fix the live website much more quickly than go
>>>>>> through the whole release process and convince people to upgrade. (I
>>>>>> recognize accepting this argument is, to some degree at least, saying
>>>>>> that we don't care about the correctness of docs as much as so-called
>>>>>> "real" code, if we go there.)
>>>>>>
>>>>>> If we decide to go ahead and merge (and I would not object), there are
>>>>>> some things I would like to see.
>>>>>>
>>>>>> (1) I would like to understand what we would do afterwards to "review
>>>>>> the outcome, and ensure that all the content is there," and why it
>>>>>> can't be done before merging instead. (Is it because it'd take time
>>>>>> and we don't want to incorporate changes that are made to the website
>>>>>> in the meantime? I think that boat has sailed, but maybe we can avoid
>>>>>> making it worse...)
>>>>>>
>>>>>> (2) I spot checked the content, pulled out some common patterns, and
>>>>>> it mostly looks good, but there were also some issues (e.g. several
>>>>>> pages were replaced with the contents from entirely different pages).
>>>>>> I would be more comfortable if, say, a smoke test of comparing the old
>>>>>> and new sites, with html tags stripped and ignoring whitespace,
>>>>>> yielded what should be empty diffs.
>>>>>>
>>>>>> (3) It'd be good to have someone give a stamp of approval on the
>>>>>> infrastructure changes, at least to validate that we're not going to
>>>>>> be taking on extra tech debt with regard to jenkins stability and
>>>>>> developer workflow. I see that Brian has at least looked at this some.
>>>>>>
>>>>>> - Robert
>>>>>>
>>>>>>
>>>>>> On Thu, May 7, 2020 at 12:40 PM Aizhamal Nurmamat kyzy
>>>>>> <aizha...@apache.org> wrote:
>>>>>> >
>>>>>> > Thank you Ahmet.
>>>>>> >
>>>>>> > Robert/Brian, what do you think?
>>>>>> >
>>>>>> > The website staging and pre commit tests have passed [1]. If nobody
>>>>>> has objections, we could merge it soon.
>>>>>> >
>>>>>> >
>>>>>> > [1] https://github.com/apache/beam/pull/11554
>>>>>> >
>>>>>> > On Thu, May 7, 2020 at 11:38 AM Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, May 7, 2020 at 10:50 AM Aizhamal Nurmamat kyzy <
>>>>>> aizha...@apache.org> wrote:
>>>>>> >>>
>>>>>> >>> Thanks for the writeup Ahmet.
>>>>>> >>>
>>>>>> >>> My bias is to move forward and merge the PR. After this, we'll
>>>>>> review the outcome, and ensure that all the content is there. Nam will 
>>>>>> help
>>>>>> us with that.
>>>>>> >>> The reason that I'd like to move forward and merge what we have
>>>>>> now - is that if we don't do that, the work done so far will be lost.
>>>>>> >>> We'll make sure to stage the website in its current state, and
>>>>>> use that as reference/archive to ensure all the content have been moved.
>>>>>> >>>
>>>>>> >>> Is this reasonable to everyone?
>>>>>> >>
>>>>>> >>
>>>>>> >> This is reasonable to me. I agree with your reasons.
>>>>>> >>
>>>>>> >> What do others think?
>>>>>> >>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Wed, May 6, 2020 at 7:07 PM Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Wed, May 6, 2020 at 2:33 PM Aizhamal Nurmamat kyzy <
>>>>>> aizha...@apache.org> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> > 1) Currently, the main blocker for merging is Staging Test
>>>>>> Failures.
>>>>>> >>>>>>
>>>>>> >>>>>> That and finishing the review. (Is someone
>>>>>> tracking/coordinating this?)
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> I am coordinating the work on the failed tests, but I would
>>>>>> need other committer's help to perform the review. @Ahmet, could you help
>>>>>> us prioritize the review for this PR?
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> The problem is there are too many manual changes. Reviewing this
>>>>>> change in this form will require a large effort. I do not think I can
>>>>>> interrupt other projects to prioritize reviews on this PR. IMO, we have a
>>>>>> few options:
>>>>>> >>>>
>>>>>> >>>> - PR to be restructured in the format suggested in this thread.
>>>>>> A commit for infrastructure changes from Jekyll to hugo. A second commit
>>>>>> for a script that will convert the majority of the content. A third 
>>>>>> commit
>>>>>> for the execution of the script. And a fourth commit for the additional
>>>>>> manual content changes. If Nam can get to this form, people on this 
>>>>>> thread
>>>>>> myself/Robert/Pablo/Brian can review the changes.
>>>>>> >>>> - Another option is, we can accept that we already invested in
>>>>>> this transition and overall this is a good change, and merge the PR more 
>>>>>> or
>>>>>> less in its current form (with tests fixed and open comments addressed)
>>>>>> even though it has issues. And then overtime fix the issues we encounter.
>>>>>> There was already some amount of review and visual comparisons, we risk
>>>>>> losing some recent content changes but I am assuming this will not be 
>>>>>> much.
>>>>>> If Nam can commit to compare two sites after a merge, fixing the majority
>>>>>> of the delta, this might be a viable option.
>>>>>> >>>>
>>>>>> >>>> Another thing we can do, we can archive/store a read-only copy
>>>>>> of the current website in an "archive" url temporarily instead of
>>>>>> completely deleting it. It will give us a baseline for a while to go back
>>>>>> to the old content and move any missing data. (And maybe, someone can 
>>>>>> come
>>>>>> up with an innovative way to compare the textual content of both sites.) 
>>>>>> A
>>>>>> note on the stop world approach, I believe we are already failing on that
>>>>>> with merge conflicts showing up on the PR. It will be better for us to
>>>>>> complete the transition as soon as possible. Fixing after the initial 
>>>>>> merge
>>>>>> might be a simpler task, especially if we can archive the old site.
>>>>>> >>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> > Michal showed Nam how to handle the 1st test which was about
>>>>>> Apache License missing.
>>>>>> >>>>>> >
>>>>>> >>>>>> > However, the 2nd and 3rd tests looked like some kind of
>>>>>> permissions error on the Jenkins worker, not to be configured by code. 
>>>>>> For
>>>>>> more details based on Jenkin logs, the 2nd test failed because of
>>>>>> website/www/site/themes and the 3rd test failed because of
>>>>>> website/www/node_modules, they are both auto-generated files on build. 
>>>>>> Can
>>>>>> someone help Nam to look into this?
>>>>>> >>>>>> >
>>>>>> >>>>>> > RAT ("Run RAT PreCommit") — FAILURE
>>>>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") —
>>>>>> FAILURE
>>>>>> >>>>>> > Website_Stage_GCS ("Run Website_Stage_GCS PreCommit") —
>>>>>> FAILURE
>>>>>> >>>>>> >
>>>>>> >>>>>> > 2) Are there any other blockers for merging?
>>>>>> @Ahmet/Robert/others please share if there are any other blockers.
>>>>>> >>>>>> >
>>>>>> >>>>>> >
>>>>>> >>>>>> > [1] https://github.com/gohugoio/hugo/pull/4494
>>>>>> >>>>>> >
>>>>>> >>>>>> >
>>>>>> >>>>>> > On Wed, May 6, 2020 at 10:19 AM Robert Bradshaw <
>>>>>> rober...@google.com> wrote:
>>>>>> >>>>>> >>
>>>>>> >>>>>> >> On Mon, May 4, 2020 at 7:07 PM Ahmet Altay <
>>>>>> al...@google.com> wrote:
>>>>>> >>>>>> >> >
>>>>>> >>>>>> >> >> On Mon, May 4, 2020 at 6:30 PM Robert Bradshaw <
>>>>>> rober...@google.com> wrote:
>>>>>> >>>>>> >> >>>
>>>>>> >>>>>> >> >>> I took the massive commit and split it up into:
>>>>>> >>>>>> >> >>>
>>>>>> >>>>>> >> >>> (1) Infrastructure changes (basically everything
>>>>>> outside of
>>>>>> >>>>>> >> >>> (website/www/site/content)
>>>>>> >>>>>> >> >>> (2) Sed script changes, and
>>>>>> >>>>>> >> >>> (3) Manual changes (everything not in (1) and (2)).
>>>>>> >>>>>> >> >
>>>>>> >>>>>> >> >
>>>>>> >>>>>> >> > Thank you Robert. This makes it much easier. What is the
>>>>>> source of the sed script? I am not sure why some of those lines are 
>>>>>> there.
>>>>>> It would be much easier for us to comment on the script source if it is
>>>>>> reviewable somewhere.
>>>>>> >>>>>> >>
>>>>>> >>>>>> >> I just gathered up common patterns as I was trying to go
>>>>>> through and
>>>>>> >>>>>> >> review the files... Mostly it was an exercise in finding a
>>>>>> compact
>>>>>> >>>>>> >> representation for the delta, not trying to be a perfect
>>>>>> conversion.
>>>>>> >>>>>> >> (I do think in retrospect, if we do something like this
>>>>>> again, it
>>>>>> >>>>>> >> would be preferable to commit a script that does the
>>>>>> auto-conversion
>>>>>> >>>>>> >> (maybe even with some patch files for manual changes) both
>>>>>> for ease of
>>>>>> >>>>>> >> reviewing and to avoid the stop-the-world situation we're
>>>>>> in now. (I'm
>>>>>> >>>>>> >> still worried that some changes will get lost in the
>>>>>> shuffle.)
>>>>>>
>>>>>

Reply via email to