That's why a package system tends to be a complex beast, with a dependency tree between packages etc, so you'd have a hadoop-common package and hadoop-auth and hadoop-hdfs that depend on it. But I don't know if we want to go there, package management is not Solr's core business.
Another thing to remember: Once we factor out hadoop as a module (contrib), we may like to upgrade the version in solr-core for certain common dependencies that were locked on old versions due to hadoop. But, if a user then tries to drop the module-jars (and dependencies) into SOLR_HOME/lib/ or similar, there will be jar version conflicts between module and core. If loading the module through package manager however, there will be classloader isolation and more likely to succeed. I don't have a list of such potential crashes, and I hear that newer versions of hadoop is better and use shading for some deps, but whoever prepares the module should do a thorough check of the resulting modules/<hadoop-foo>/lib/ folder and cross-check it with jars in WEB-INF/lib/ to look for trouble - perhaps there are workarounds. Jan > 20. jan. 2022 kl. 23:12 skrev Kevin Risden <[email protected]>: > > Yea it would be duplicate jars in both places. It is a shame both share the > name "hadoop" since the two features - filesystem and authentication. They > end up being two entirely different things both in Hadoop itself and inside > of Solr. > > Kevin Risden > > > On Thu, Jan 20, 2022 at 4:58 PM David Smiley <[email protected] > <mailto:[email protected]>> wrote: > Separate modules will mean our distro will end up duplicating hadoop-common > and other related JARs for both modules. I was trying to be practical. But > it's not important to me; ok. > implementation ('org.apache.hadoop:hadoop-common') { transitive = false } // > too many to ignore > implementation ('org.apache.hadoop:hadoop-annotations') > runtimeOnly 'org.apache.htrace:htrace-core4' // note: removed in Hadoop 3.3.2 > runtimeOnly "org.apache.commons:commons-configuration2" > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > <http://www.linkedin.com/in/davidwsmiley> > > On Thu, Jan 20, 2022 at 4:55 PM Kevin Risden <[email protected] > <mailto:[email protected]>> wrote: > My preference would be as a separate HadoopAuthentication or something > module. HDFS the filesystem / blockcache / etc support is unique and separate > from the authentication part. It shouldn't all be in one module. > > Kevin Risden > > > On Thu, Jan 20, 2022 at 4:48 PM David Smiley <[email protected] > <mailto:[email protected]>> wrote: > The issue https://issues.apache.org/jira/browse/SOLR-14660 > <https://issues.apache.org/jira/browse/SOLR-14660> is about moving the HDFS > plugins out of core into a module. While a great thing, it still leaves > quite a few Hadoop related dependencies in solr-core because Hadoop is not > there only for HDFS; it's there for some exotic authentication & > authorization plugins. In that JIRA issue I proposed that this module be > "hadoop" and have any hadoop related plugins. > > As a quick experiment, I commented out the hadoop-auth dependency and tried > to compile to see what the compiler caught. It exposed the following two Solr > plugins: > * HadoopAuthPlugin > * KerberosPlugin > > Are we okay with expanding the scope of SOLR-14660 to include these? > > Note that SOLR-14660 *might* result in 9.0 not including this module in the > release distribution if we don't feel the module will be sufficiently ready > to release. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > <http://www.linkedin.com/in/davidwsmiley>
