[ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784664#comment-13784664
 ] 

Allen Wittenauer commented on HADOOP-9902:
------------------------------------------

These are sort of out of order.

bq. playing with this. sometimes the generated classpath is , say. 
share/hadoop/yarn/* ; the capacity scheduler is /*.jar -should everything be 
consistent.

At one point I thought about processing the regex string to dedupe it down to 
the jar level. This opens up a big can of worms, however: if you hit two of 
them, do you always take the latest?  What does latest mean anyway (date or 
version)?  Will we be able to parse the version out of the filename? How do we 
deal with user overrides?  Still take the latest no matter what?

I've opted to basically let the classpath as it is passed to us stand.  
Currently the dedupe code is pretty fast for interpreted shell. :) The *only* 
sub-optimization that I might be tempted to do is to normalize any symlinks and 
relative paths.  There is a good chance we'll catch a few dupes this way... but 
it likely isn't worth the extra execution time.

It's worth pointing out that a user can feasibly replace the add_classpath code 
in hadoop-env.sh to override the functionality without changing the base Apache 
code if they want/need more advanced classpath handling. (e.g., HADOOP-6997 
seems to be a non-issue to me since passing duplicate class names is just bad 
practice;  changing the collation is fixing a symptom of a much 
bigger/dangerous problem. But someone facing this issue could theoretically fix 
a collation problem on their own, "legally" in a stable way using this trick.)

bq. I don't see hadoop tools getting on the CP: is there a plan for that?

Tools path gets added as needed.  I seem to recall this is exactly the same way 
in the current shell scripts.

bq. Because it would suit me to have a directory into which I could put things 
to get them on a classpath without playing with HADOOP_CLASSPATH

I was planning on bringing up this exact issue after I get this one committed.  
It's a harder discussion because the placement is tricky and there are a lot of 
options to make this functionality happen.  Do we add another env var?  Do we 
just auto-prepend $HADOOP_PREFIX/lib/share/site/*?  Do we offer both prepend 
and append options? etc etc. All have pro's and con's.  Some of the choices 
become feasible really only after this is committed, however. 

bq. we do need to think when and how to react to (conf dir) absence

Good point.  That's pretty easy to add given that the conf dir handling is 
fairly well contained now in the hadoop_find_confdir function in 
hadoop-functions.sh.  It's pretty trivial to throw a fatal error if we don't 
detect, say, hadoop-env.sh in what we resolved HADOOP_CONF_DIR to.  Suggestions 
on what to check for?

bq. actually a rebuild fixes that. What I did have to do was drop 
hadoop-functions.sh into libexec

Yeah, after commit this is pretty much a flag day for all of the Hadoop 
subprojects. I talked to a few folks about it and it was generally felt that 
this should be one big patch+JIRA rather than several smaller ones per project 
given the interdependency on common.  We'll have to advertise on the various 
-dev mailing lists post commit to say do a full rebuild.  Hopefully folks won't 
have to change their *-env.sh files and they will continue without 
modification, however.

Thanks!

> Shell script rewrite
> --------------------
>
>                 Key: HADOOP-9902
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9902
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 2.1.1-beta
>            Reporter: Allen Wittenauer
>            Assignee: Allen Wittenauer
>         Attachments: hadoop-9902-1.patch, more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to