On 1/3/14, 10:55 AM, Sean Busbey wrote:
Heya!

Earlier this week we had a user in IRC that was having difficulty running
1.5.0 because their classpath didn't include commons-configuration.

In one case, they just needed to fix their accumulo-site to include hadoop
2 paths. In the other, they were using Apache Hadoop 0.20.2, which has no
commons-configuration.

Initially, the user thought they were running a CDH3 version. This turned
out not to be the case, but it so happens that CDH3 also does not have
commons-configuration provided by Hadoop.

This interaction pointed out 2 issues, and I'd like some opinions on how to
handle them before I file jiras and possibly patches.

1) We are not sufficiently warning people about the need for durable sync

Or maybe we're just not getting across when durable sync is available.
Hadoop versions are nonsensical for most outsiders, so I think we need to
spell it out in docs. Waiting for users to start an instance and then look
at a log is insufficient.

I recently did:

https://issues.apache.org/jira/browse/ACCUMULO-1637
https://issues.apache.org/jira/browse/ACCUMULO-1946

If these are still lacking, then we can open tickets for the omissions. I thought I had tracked down everything for Apache Hadoop pretty well and got appropriate checks for durable.sync/append and synconclose.

(talking to Sean on IRC) We can make a ticket to add stronger warnings about 0.20 releases not supporting append/sync correctly and how you can/will lose data.

I'm thinking we need something similar to what HBase has[1].

My question is, where should I add this? the README seems like a good
place, since it already talks about enabling durable sync. How about the
user manual? Both?

Both is probably good. I don't think we have anything on Hadoop versions in the user manual (or the administration manual, if that's still a thing).

2) Should we document commons-configuration similar to commons-io?

The README already has a section about how some older versions of Hadoop
don't have commons-io. I think the versions given need to be tightened up
given (1) above (since right now it implicitly refers to versions people
should not be using).

The only Hadoop distro I know of that both has proper append support and
does not have commons-configuration is CDH3. In addition to being a
vendor-specific version, it is no longer supported by said vendor.

So would it be preferable to

   2a) add a note after the commons-io section that gives similar
instructions for adding commons-configuration?

   2b) file a jira that points out that users on CDH3 won't have commons
configuration, document the work around on said ticket, close it as won'tfix

The idea with the latter approach is that it would give searchers a chance
to find the information and give us somewhere to point people, while not
adding to our long-term documentation baggage. The downside is that this
won't be as accessible to users, so it will be more painful for them (esp
if they don't have regular internet access).

I'm not sure of what's best to do here. 1.6 undid the provided scope on those dependencies because 1.5 was such a pain to deal with in this regard (at least that's how I remember it). Perhaps a Jira is a good reference point and we can link to the ticket which made that change in 1.6. I doubt most users will find that on their own, but perhaps some might and it at least would keep us from having to repeat the same answer.


-Sean

[1]: http://hbase.apache.org/book/configuration.html#hadoop.older.versions

Reply via email to