Alan found the place where website publishing is configured [1], which has
examples of project sites being configured with more than one git root.
This is great for us because it allows us to leave generated
javadocs/pydocs in the beam-site repository and publish website markdown
content from the main repo.

Alan has a PR ready to publish generated HTML in a post-commit job [2].
Once that goes through the last step is to upgrade the publishing config.

[1]
https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
[2] https://github.com/apache/beam/pull/6431

On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner <sweg...@google.com> wrote:

> > We could add a new default branch (master?) and keep all the
> non-generated files (src/) there, and put generated files (content/) in the
> asf-site branch (like we already do).
>
> I'm strongly in favor of having sources in a single repository. We have
> significant process and infrastructure built up for the apache/beam repo
> (for build, PR, CI, release, etc.) that we can take advantage of by putting
> website sources in the same repo. The current beam-site repo PR automation
> is flaky because it was custom-built and not given the same level of
> attention as the main repo.
>
> The caveat to consolidating website sources in the main repo is that it
> incentivizes putting the generated sources branch on the same repo. I've
> documented a few of the reasons in the Appendix of the design doc [1]:
>  - It's easier to maintain a single repository; easily apply existing
> tooling/infrastructure
> - Jenkins tooling for publishing generated HTML may not work cross-repo [2]
>
> My preference is to move forward with the migration of sources to
> apache/beam [master], and website generated HTML to apache/beam [asf-site].
> I like the idea of separating the publishing/hosting of generated
> javadocs/pydocs since they add so much cruft, but it should not hold up the
> migration.
>
> [1]
> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>
> [2]
> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>
> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri <eh...@google.com> wrote:
>
>> Staying on beam-site SGTM. We could add a new default branch (master?)
>> and keep all the non-generated files (src/) there, and put generated files
>> (content/) in the asf-site branch (like we already do).
>> That way there's no confusion as to which files you should update.
>> (This is of course assuming we still place generated docs in git repos.)
>>
>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise <t...@apache.org> wrote:
>>
>>> My thought was to leave the asf-site branch in the beam-site repository,
>>> add generated docs to that branch (until we have a better solution), and
>>> have only sources in the beam repo.
>>>
>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>> it would eliminate the need to place generated docs into git repos.
>>>
>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> I believe that beam.apache.org is populated from the asf-site branch
>>>> of the apache/beam-site repo. (gitpubsub:
>>>> https://www.apache.org/dev/project-site.html#intro)
>>>> If we move the markdown-based docs to apache/beam, leave generated
>>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>>> then javadoc and pydoc will not get pushed to the website.
>>>>
>>>> Is there some place where we can push javadoc and pydoc files? Or
>>>> perhaps there an alternative way to push updates to beam.apache.org?
>>>> (not requiring the asf-site branch)
>>>>
>>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise <t...@apache.org> wrote:
>>>>
>>>>> Hi Scott,
>>>>>
>>>>> Thanks for bringing the discussion back here.
>>>>>
>>>>> I agree that we should separate the changes for hosting of generated
>>>>> java/pydocs from the rest of website automation so that we can make the
>>>>> switch and fix the contributor headache soon.
>>>>>
>>>>> But perhaps we can avoid adding 4m lines of generated code to the main
>>>>> beam repository (and keep on adding with every release) if we continue to
>>>>> serve the site from the old beam-site repo? (I left a comment the doc.)
>>>>>
>>>>> About trying buildbot, as mentioned earlier I would be happy to help
>>>>> with it. I prefer a setup that keeps the docs separate from the web site.
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner <sc...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Re-opening this thread as it came up today in the discussion for
>>>>>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>>>>>> Reliability improvements; design doc here:
>>>>>> https://s.apache.org/beam-site-automation
>>>>>>
>>>>>> The current plan is to keep generated javadoc/pydoc sources only on
>>>>>> the asf-site branch, which is necessary for the current githubpubsub
>>>>>> publishing mechanism. This maintains our current approach, the only 
>>>>>> change
>>>>>> being that we're moving the asf-site branch from the retiring
>>>>>> apache/beam-site repository into a new apache/beam repo branch.
>>>>>>
>>>>>> The concern for committing generated content is the extra overhead
>>>>>> during git fetch. I did some analysis to measure the impact [2], and 
>>>>>> found
>>>>>> that fetching a week of source + generated content history from
>>>>>> apache/beam-site took 0.39 seconds.
>>>>>>
>>>>>> I like the idea of publishing javadoc/pydoc snapshots to an external
>>>>>> location like Flink does with buildbot, but that work is separable and
>>>>>> shouldn't be a prerequisite for this effort. The goal of this work is to
>>>>>> improve the reliability of automation for contributing website changes. 
>>>>>> At
>>>>>> last measure, only about half of beam-site PR merges use Mergebot without
>>>>>> experiencing some reliability issue [3].
>>>>>>
>>>>>> I've opened BEAM-5459 [4] to track moving our generated docs out of
>>>>>> git. Thomas, would you have bandwidth to look into this?
>>>>>>
>>>>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>>>>> [2]
>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>>>>> [3]
>>>>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>>>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>>>>
>>>>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise <t...@apache.org> wrote:
>>>>>>
>>>>>>> Hi Udi,
>>>>>>>
>>>>>>> Good to know you will continue this work.
>>>>>>>
>>>>>>> Let me know if you want to try the buildbot route (which does not
>>>>>>> require generated documentation to be checked into the repo). Happy to 
>>>>>>> help
>>>>>>> with that.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>>
>>>>>>>> I'm picking up the website migration. The plan is to not include
>>>>>>>> generated files in the master branch.
>>>>>>>>
>>>>>>>> However, I've been told that even putting generated files a
>>>>>>>> separate branch could blow up the git repository for all (e.g. make git
>>>>>>>> pulls a lot longer?).
>>>>>>>> Not sure if this is a real issue or not.
>>>>>>>>
>>>>>>>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw <
>>>>>>>> rober...@google.com> wrote:
>>>>>>>>
>>>>>>>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise <t...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Yes, I think the separation of generated code will need to occur
>>>>>>>>> prior to completing the merge and switching the web site to the main 
>>>>>>>>> repo.
>>>>>>>>> >
>>>>>>>>> > There should be no reason to check generated documentation into
>>>>>>>>> either of the repos/branches.
>>>>>>>>>
>>>>>>>>> Huge +1 to this. Thomas, would have time to set something like
>>>>>>>>> this up
>>>>>>>>> for Beam? If not, could anyone else pick this up?
>>>>>>>>>
>>>>>>>>> > Please see as an example how this was solved in Flink, using the
>>>>>>>>> ASF buildbot infrastructure.
>>>>>>>>> >
>>>>>>>>> > Documentation per version/release, for example:
>>>>>>>>> >
>>>>>>>>> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>>>>>>>>> >
>>>>>>>>> > The buildbot configuration is here (requires committer access):
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> > Thomas
>>>>>>>>> >
>>>>>>>>> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin <
>>>>>>>>> mig...@google.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Last time I talked with Scott I brought this idea in. I believe
>>>>>>>>> the plan was either to publish compiled site to website directly, or 
>>>>>>>>> keep
>>>>>>>>> it in separate storage from apache/beam repo.
>>>>>>>>> >>
>>>>>>>>> >> One of the main reasons not to check in compiled version of
>>>>>>>>> website is that every developer will have to pull all the versions of
>>>>>>>>> website every time they clone repo, which is not that good of an idea 
>>>>>>>>> to do.
>>>>>>>>> >>
>>>>>>>>> >> Regards,
>>>>>>>>> >> --Mikhail
>>>>>>>>> >>
>>>>>>>>> >> Have feedback?
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri <eh...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Pablo, the docs are generated into versioned paths, e.g.,
>>>>>>>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags
>>>>>>>>> are not necessary?
>>>>>>>>> >>> Also, once apache/beam-site is merged with apache/beam the
>>>>>>>>> release branch should have the relevant docs (although perhaps it's 
>>>>>>>>> better
>>>>>>>>> to put them in a different repo or storage system).
>>>>>>>>> >>>
>>>>>>>>> >>> Thomas, I would very much like to not have javadoc/pydoc
>>>>>>>>> generation be part of the website review process, as it takes up a 
>>>>>>>>> lot of
>>>>>>>>> time when changes are staged (10s of thousands of files), especially 
>>>>>>>>> when a
>>>>>>>>> PR is updated and existing staged files need to be deleted.
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin <
>>>>>>>>> mig...@google.com> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>> +1 For removing old documentation.
>>>>>>>>> >>>>
>>>>>>>>> >>>> @Thomas: Migration work is in backlog and will be picked up
>>>>>>>>> in near time.
>>>>>>>>> >>>>
>>>>>>>>> >>>> --Mikhail
>>>>>>>>> >>>>
>>>>>>>>> >>>> Have feedback?
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise <t...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> +1 for removing pre 2.0 documentation (as well as the
>>>>>>>>> entries from https://beam.apache.org/get-started/downloads/)
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Isn't it part of the beam-site changes that we will no
>>>>>>>>> longer check in generated documentation into the repository? Those 
>>>>>>>>> can be
>>>>>>>>> generated and deployed independently (when a commit to a branch 
>>>>>>>>> occurs),
>>>>>>>>> such as done in the Apex and Flink projects.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> I was told that Scott who was working in the beam-site
>>>>>>>>> changes is on leave now and the migration is still pending (see note 
>>>>>>>>> at
>>>>>>>>> https://github.com/apache/beam/tree/master/website). Is anyone
>>>>>>>>> else going to pick it up?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Thanks,
>>>>>>>>> >>>>> Thomas
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada <
>>>>>>>>> pabl...@google.com> wrote:
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Is it worth adding a tag / branch to the repositories every
>>>>>>>>> time we make a release, so that people are able to dive in and find 
>>>>>>>>> the
>>>>>>>>> docs?
>>>>>>>>> >>>>>> Best
>>>>>>>>> >>>>>> -P.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay <
>>>>>>>>> al...@google.com> wrote:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> I would guess that users are still using some of these old
>>>>>>>>> releases. It is unclear from Beam website which releases are still
>>>>>>>>> supported or not. It probably makes sense to drop documentation for
>>>>>>>>> releases < 2.0. (I would suggest keeping docs for 2.0). For the 
>>>>>>>>> future I
>>>>>>>>> can work on updating the Beam website to clarify the state of each 
>>>>>>>>> release.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri <
>>>>>>>>> eh...@google.com> wrote:
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> The older docs are not directly linked to and are in
>>>>>>>>> Github commit history.
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> If there are no objections I'm going to delete javadocs
>>>>>>>>> and pydocs for releases older than 1 year,
>>>>>>>>> >>>>>>>> meaning 2.0.0 and older (going by the dates here).
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>>>>>>>> danolive...@google.com> wrote:
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> The older docs should be recorded in the commit history
>>>>>>>>> of the website repository, right? If they're not currently used in the
>>>>>>>>> website and they're in the commit history then I don't see a reason 
>>>>>>>>> to save
>>>>>>>>> them.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri <
>>>>>>>>> eh...@google.com> wrote:
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Hi all,
>>>>>>>>> >>>>>>>>>> I'm writing a PR for apache/beam-site and
>>>>>>>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because 
>>>>>>>>> it's
>>>>>>>>> trying to deletes 22k files and then copy 22k files (warning large 
>>>>>>>>> file).
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> It seems that we could save a lot of time by deleting
>>>>>>>>> the older javadoc and pydoc files for older versions. Is there a good
>>>>>>>>> reason to keep around this kind of documentation for older versions 
>>>>>>>>> (say 1
>>>>>>>>> year back)?
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>> --
>>>>>>>>> >>>>>> Got feedback? go/pabloem-feedback
>>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>
>>>>>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


-- 




Got feedback? tinyurl.com/swegner-feedback

Reply via email to