FYI: I started an Apache wiki page with an outline of existing docs and current 
thoughts/status about how to disentangle the parts that overlap with 
CDH-specific or CM-specific portions:

https://cwiki.apache.org/confluence/display/IMPALA/Documentation

Probably people who are interested in that side of the project should put a 
watch on that page.

I guess we should also think about things related to performance / planning / 
sizing.  A considerable amount of content from the “Impala Performance 
Cookbook” are reproduced in the docs.  Is that all appropriate to donate to 
Apache?  Or are things that are mainly “observations from / advice to the FCE 
group” appropriate to keep more on the Cloudera side?  Probably we’ll need to 
look at that in closer detail.  I’m talking about pages like:

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_cluster_sizing.html
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_schema_design.html
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_perf_skew.html
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_scalability.html

For the moment, my expectation is that the doc material that is donated to 
Apache will continue to be reproduced (perhaps with CDH-specific bonus content) 
in the main Cloudera docs.  There is some degree of customer expectation that 
we’ve set, and I don’t see a good way to remove the material entirely without 
causing a lot of disruption.  E.g. readers and searchers would find many 
instances of older Impala docs on cloudera.com, but not the latest ones.  I 
don’t think we could physically remove the Impala doc content during the life 
of the CDH 5.x series; I don’t know if it would be practical right at the 
launch of CDH (N>5).  That’s something we can revisit as we go forward as we 
develop more concrete ideas.

John

> On Jan 4, 2016, at 6:55 AM, Tom White <[email protected]> wrote:
> 
> On Thu, Dec 31, 2015 at 5:17 AM, John Russell <[email protected]> wrote:
>> 
>> I would say there's a fair bit of decision-making and followup work having 
>> to do with documentation.
> 
> Agreed. This doesn't need to be tied up with the podling report
> though, so I've changed the subject line to reflect that.
> 
>> 
>> For example, the current Impala docs that are embedded within the Cloudera 
>> doc library cover a wide range of subjects:
>> 
>> - "How to use Impala with <component XYZ>".  For example, Impala with 
>> Sentry, Impala with HBase, Impala with S3, Impala with Isilon...  Some 
>> components are Hadoop-based, others are more specific to what's shipped or 
>> integrated with CDH.  I feel like we should have a spreadsheet because these 
>> seem like decisions to make on a case-by-case basis.
>> 
>> - "How to do <task XYZ> with Impala".  Performance tuning, troubleshooting, 
>> deployment planning.  Same kinds of considerations as the previous bullet.  
>> Many of these aren't strictly part of core Impala features, rather they're 
>> things that could have been delivered via blog posts, O'Reilly books, etc.  
>> Again, there could be some amount of identifying / deciding / untangling to 
>> produce the right subset to go in Apache-oriented docs.
>> 
>> - "How to do <task XYZ> with Impala in Cloudera Manager".  That seems like 
>> an easy call to say, that kind of stuff doesn't get donated to Apache 
>> because it's CDH-specific.  That kind of content though is intermixed with 
>> "how to do <task XYZ> _without_ Cloudera Manager" so it would be some work 
>> to untangle instructions like that.
>> 
>> - "CREATE TABLE" and similar language reference stuff.  Doesn't every SQL 
>> engine in the open source arena come with a language reference of one sort 
>> or another...  So I assume there has to be something either donated or 
>> created from scratch along those lines.  (Although my open source experience 
>> is with MySQL, where the docs are under a more restrictive license than the 
>> software, so I don't have exact precedents to go by.)
>> 
>> Assuming that some amount of existing CDH doc is donated, then for purposes 
>> of building, accepting contributions, etc. do we need to convert the content 
>> to some particular format or use some specific build system?  The doc 
>> content that I'm talking about is currently in XML, with a DTD (DITA) that 
>> can be built using an all-open-source toolchain.  The format and toolchain 
>> might be a little more heavyweight than on a lot of other Apache projects.
> 
> There's no mandated documentation system for projects at Apache, so
> using DITA shouldn't be a problem, especially since it can be built
> using an open source toolchain, as you point out. Having some
> instructions on how to build the docs would be useful if they don't
> already exist.
> 
>> 
>> The main advantage of the current format for the Impala doc library is ease 
>> of reuse.  So there's the question of whether Apache-donated stuff doc like 
>> language reference then _only_ exists in the context of the project site, or 
>> gets reused within the doc library on cloudera.com.  There are pros and cons 
>> either way.  Even if we centralize future docs on the impala.io site, so 
>> there isn't a new instance corresponding to each new CDH x.y release, there 
>> are still all the older instances of those pages from CDH 4.x, CDH 5.x, 
>> Impala 1.x, and Impala 2.x docs on cloudera.com.
>> 
>> I've been cogitating over these considerations the last few weeks, but no 
>> approach has really jumped out at me as a slam dunk:
>> 
>> a) Rip as much existing doc out of the Cloudera library as possible, convert 
>> to the most contributor-friendly format, decouple entirely from the CDH 
>> library?
>> b) Donate core Impala feature docs only, keep the XML format the same, 
>> encourage verbatim reuse of doc content across CDH and other distributions 
>> that include Impala?
> 
> I would vote for a combination of a and b - donate all
> non-CDH-specific doc, and keep the existing XML format (DITA).
> 
> Cheers,
> Tom
> 
>> c) Some middle ground?  For example, it would be possible to mix and match 
>> the current XML doc format with user-contributed content in Markdown format.
>> 
>> Thanks,
>> John
>> 
>>> On Dec 30, 2015, at 3:07 PM, Henry Robinson <[email protected]> wrote:
>>> 
>>> Hi all -
>>> 
>>> Here's a draft of our inaugural podling report. Per the usual guidelines,
>>> Impala has to submit three monthly reports to the Incubator PPMC, after
>>> which we report every quarter. The purpose of the report is to expose the
>>> current state of the graduation effort to the Incubator, and to flag any
>>> problems that require Incubator attention.
>>> 
>>> I hope this report also sheds a little light on what is needed to be done
>>> to move Impala's development in its entirety to the ASF and its
>>> infrastructure. We are looking forward to making quick progress on some of
>>> these items in 2016.
>>> 
>>> If anyone has any further comments or edits they'd like to make, please
>>> respond to this thread. I am on a short timeline as I fly internationally
>>> tomorrow and will be out of contact for about ten days, so I plan to post
>>> this to the Incubator wiki tomorrow morning. Any edits can then be made
>>> there.
>>> 
>>> Thanks,
>>> Henry
>>> 
>>> --------------------
>>> Impala
>>> 
>>> Impala is a high-performance C++ and Java SQL query engine for data stored
>>> in
>>> Apache Hadoop-based clusters.
>>> 
>>> Impala has been incubating since 2015-12-03.
>>> 
>>> Three most important issues to address in the move towards graduation:
>>> 
>>> 1. Resolve any issues around use of Gerrit as code-review tool.
>>> 2. Movement of existing JIRA / Git / wiki / e-mail resources to Apache
>>> equivalents
>>> 3. Initial release as incubating project.
>>> 
>>> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
>>> aware of?
>>> 
>>> None.
>>> 
>>> How has the community developed since the last report?
>>> 
>>> Slowly - Impala is still in the very early stages of incubation, and
>>> performing the mechanical tasks of code movement and infrastructure setup
>>> is our first priority. The holiday period in the United States has slowed
>>> this effort slightly, but we look forward to picking up pace in early 2016.
>>> There have been no additions to the committer or PMC lists since incubation
>>> began.
>>> 
>>> How has the project developed since the last report?
>>> 
>>> We have performed some of the basic initial tasks for incubation -
>>> establishing wiki pages, Git repositories and accounts for the initial
>>> committer set. Our next steps are:
>>> 
>>> 1. Finalize the SGA from Cloudera
>>> 2. Move existing @cloudera.org e-mail aliases to their @
>>> impala.incubator.apache.org equivalents.
>>> 3. Move source code from Cloudera git repository to Apache git repo.
>>> 4. Improve out-of-box build and test experience so that community can
>>> easily evaluate release artifacts.
>>> 5. Migrate cloudera.org JIRA tickets to issues.apache.org.
>>> 
>>> 
>>> Date of last release:
>>> 
>>> NA
>>> 
>>> When were the last committers or PMC members elected?
>>> 
>>> At the time of the Incubation vote, 2015-12-03.
>> 

Reply via email to