Re: Automatically Documenting Apache Hadoop Configuration

Harsh J Mon, 05 Dec 2011 11:23:00 -0800

I've seen Oozie do that same break-up of config param names and boy, its 
difficult to grep in such a code base when troubleshooting.


OTOH, we at least get a sane prefix for relevant config names (hope we do?)

On 06-Dec-2011, at 12:44 AM, Robert Evans wrote:

> From my work on yarn trying to document the configs there and to standardize 
> them, writing anything that is going to automatically detect config values 
> through static analysis is going to be very difficult.  This is because most 
> of the configs in yarn are now built up using static string concatenation.
> 
> public static String BASE = "yarn.base.";
> public static String CONF = BASE+"config";
> 
> I am not sure that there is a good way around this short of using a full java 
> parser to trace out all method calls, and try to resolve the parameters.  I 
> know this is possible, just not that simple to do.
> 
> I am +1 for anything that will clean up configs and improve the documentation 
> of them.  Even if we have to rewire or rewrite a lot of the Configuration 
> class to make things work properly.
> 
> --Bobby Evans
> 
> On 12/5/11 11:54 AM, "Harsh J" <ha...@cloudera.com> wrote:
> 
> Praveen,
> 
> (Inline.)
> 
> On 05-Dec-2011, at 10:14 PM, Praveen Sripati wrote:
> 
>> Hi,
>> 
>> Recently there was a query about the Hadoop framework being tolerant for
>> map/reduce task failure towards the job completion. And the solution was to
>> set the 'mapreduce.map.failures.maxpercent` and
>> 'mapreduce.reduce.failures.maxpercent' properties. Although this feature
>> was introduced couple of years back, it was not documented. Had similar
>> experience with 0.23 release also.
> 
> I do not know if we recommend using config strings directly when there's an 
> API in Job/JobConf supporting setting the same thing. Just saying - that 
> there was javadoc already available on this. But of course, it would be 
> better if the tutorial covered this too. Doc-patches welcome!
> 
>> It would be really good for Hadoop adoption to automatically dig and
>> document all the existing configurable properties in Hadoop and also to
>> identify newly added properties in a particular release during the build
>> processes. Documentation would also lead to fewer queries in the forums.
>> Cloudera has done something similar [1], though it's not 100% accurate, it
>> would definitely help to some extent.
> 
> I'm +1 for this. We do request and consistently add entries to *-default.xml 
> files if we find them undocumented today. I think we should also enforce it 
> at the review level, so that patches do not go in undocumented -- at minimum 
> the configuration tweaks at least.
>

Re: Automatically Documenting Apache Hadoop Configuration

Reply via email to