That's why a package system tends to be a complex beast, with a dependency tree 
between packages etc, so you'd have a hadoop-common package and hadoop-auth and 
hadoop-hdfs that depend on it. But I don't know if we want to go there, package 
management is not Solr's core business.

Another thing to remember: Once we factor out hadoop as a module (contrib), we 
may like to upgrade the version in solr-core for certain common dependencies 
that were locked on old versions due to hadoop. But, if a user then tries to 
drop the module-jars (and dependencies) into SOLR_HOME/lib/ or similar, there 
will be jar version conflicts between module and core. If loading the module 
through package manager however, there will be classloader isolation and more 
likely to succeed.

I don't have a list of such potential crashes, and I hear that newer versions 
of hadoop is better and use shading for some deps, but whoever prepares the 
module should do a thorough check of the resulting modules/<hadoop-foo>/lib/ 
folder and cross-check it with jars in WEB-INF/lib/ to look for trouble - 
perhaps there are workarounds.

Jan

> 20. jan. 2022 kl. 23:12 skrev Kevin Risden <[email protected]>:
> 
> Yea it would be duplicate jars in both places. It is a shame both share the 
> name "hadoop" since the two features - filesystem and authentication. They 
> end up being two entirely different things both in Hadoop itself and inside 
> of Solr.
> 
> Kevin Risden
> 
> 
> On Thu, Jan 20, 2022 at 4:58 PM David Smiley <[email protected] 
> <mailto:[email protected]>> wrote:
> Separate modules will mean our distro will end up duplicating hadoop-common 
> and other related JARs for both modules.  I was trying to be practical.  But 
> it's not important to me; ok.
> implementation ('org.apache.hadoop:hadoop-common') { transitive = false } // 
> too many to ignore
> implementation ('org.apache.hadoop:hadoop-annotations')
> runtimeOnly 'org.apache.htrace:htrace-core4' // note: removed in Hadoop 3.3.2
> runtimeOnly "org.apache.commons:commons-configuration2"
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley 
> <http://www.linkedin.com/in/davidwsmiley>
> 
> On Thu, Jan 20, 2022 at 4:55 PM Kevin Risden <[email protected] 
> <mailto:[email protected]>> wrote:
> My preference would be as a separate HadoopAuthentication or something 
> module. HDFS the filesystem / blockcache / etc support is unique and separate 
> from the authentication part. It shouldn't all be in one module.
> 
> Kevin Risden
> 
> 
> On Thu, Jan 20, 2022 at 4:48 PM David Smiley <[email protected] 
> <mailto:[email protected]>> wrote:
> The issue https://issues.apache.org/jira/browse/SOLR-14660 
> <https://issues.apache.org/jira/browse/SOLR-14660> is about moving the HDFS 
> plugins out of core into a module.  While a great thing, it still leaves 
> quite a few Hadoop related dependencies in solr-core because Hadoop is not 
> there only for HDFS; it's there for some exotic authentication & 
> authorization plugins.  In that JIRA issue I proposed that this module be 
> "hadoop" and have any hadoop related plugins.
> 
> As a quick experiment, I commented out the hadoop-auth dependency and tried 
> to compile to see what the compiler caught. It exposed the following two Solr 
> plugins:
> * HadoopAuthPlugin
> * KerberosPlugin
> 
> Are we okay with expanding the scope of SOLR-14660 to include these?
> 
> Note that SOLR-14660 *might* result in 9.0 not including this module in the 
> release distribution if we don't feel the module will be sufficiently ready 
> to release.
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley 
> <http://www.linkedin.com/in/davidwsmiley>

Reply via email to