This test failure report: http://fucit.org/solr-jenkins-reports/failure-report.html -- shows that HDFS tests have been failing for a while and are not receiving any love.
The best outcome for Solr's HDFS module is that someone / business that uses it helps out to ensure it's maintained. We should probably remove the module until/unless that happens, if it ever does. If someone's looking into these failures to help out, please make your voice heard! Meanwhile, the removal appears underway: https://issues.apache.org/jira/browse/SOLR-17609 so there is some urgency of action. On Sat, Dec 21, 2024 at 4:56 PM Arrieta, Alejandro < aarri...@perrinsoftware.com> wrote: > Hi, > > I read both dev and user lists every day in lurker mode, and yes, I read > the Hadoop auth mail threads. > But I am too low on the food chain pyramid, like bottom level, lol :-) > I mentioned this mail list thread in the internal appropriate chat room. > > 2 comments: > 1) Most generally, final users of X feature that a vendor sells supports > will not read the user and dev list because they buy that support. They > will say "do the needful" to that vendor. > Even if there are many of those final users, this is valid not only for > Solr but also for other Apache and non-Apache projects. > 2) Indexing to a local file system (spinning rust and even faster on SSD) > is faster than indexing to a distributed file system, which is generally > correct. That does not mean a distributed file system is no longer used or > has advantages in specific scenarios, like indexers, mentioned later in the > documentation. > > Now, back below my rock. > Happy holidays to all. > > Alejandro Arrieta > > On Sat, Dec 21, 2024 at 5:46 PM David Smiley <dsmi...@apache.org> wrote: > > > +1 to LinkedIn. As I said, let's see if they even notice without you > > finding a POC to tell them. At least for a month, if you don't mind :-) > > I'm really curious if we hear from them. > > > > On Sat, Dec 21, 2024 at 11:13 AM David Eric Pugh <de...@yahoo.com.invalid > > > > wrote: > > > > > > > > I will drop an email to the user list in January when folks are back to > > > get some feedback. I will check my LinkedIn and see who I know that > is > > > involved in the Hadoop project and the companies that supported it and > > see > > > if that shakes some feedback loose. > > > > > > In a perfect world, if we had someone excited about HDFS, they would > > > either step up to become an involved committer in this project, OR take > > it > > > over and move the code to their own independent repo. > > > I did a bit more poking on Cloudera's site and they did an update to > Solr > > > 8.11. > > > > > > https://docs.cloudera.com/runtime/7.2.18/release-notes/topics/rt-pubc-whats-new-solr.html > > . > > > Also, I am not even sure that they use the HDFS setup!?? At least, the > > way > > > I read > > > > > > https://community.cloudera.com/t5/Community-Articles/Understanding-Solr-Architecture-and-Best-practices/ta-p/248788 > > > is that you want to use "Local FS" for best performance. So it may be > > that > > > they can continue to use SolrCloud without HDFS. > > > > > > > > > > > > > > > On Saturday, December 21, 2024 at 10:03:49 AM EST, David Smiley < > > > dsmi...@apache.org> wrote: > > > > > > Any such proposal should start with an attempt to solicit user input. > > > > > > I do wonder if we are doing enough to communicate important decisions > > like > > > this to our users (to solicit this feedback). I worded that poorly > > maybe; > > > I don't mean to suggest inadequacy on us necessarily as I have a > greater > > > concern on Solr users not adequately paying attention to the > > news/direction > > > of the Solr project. Ah; I'm reminded sadly of a failed attempt to > have > > a > > > newsletter -- perhaps the perfect solution to this and > > > other project engagement. > > > I suggest we not reach out to specific users/organizations this time > and > > we > > > see who responds. We know of a certain organization who contributed > > these > > > modules in the first place who almost certainly still use it. Let's > see > > if > > > our outreach efforts catch their notice (and they respond) or not. > > > > > > Obviously we should get rid of it if nobody will maintain it. My hope > is > > > that users/organizations step up and give the module whatever love it > > > needs. Note that the HDFS module is Solr's *only* solution to > something > > > vaguely "cloud-native", at least a separation of storage from compute. > > It > > > was only advertised for being about HDFS (the backend storage solution) > > and > > > not advertised for its broader ability to use more modern choices like > > S3. > > > I'm sure potential users made a hard pass on this because, of course, > > they > > > don't have and don't want to run HDFS. > > > > > > On Sat, Dec 21, 2024 at 8:36 AM David Eric Pugh > <de...@yahoo.com.invalid > > > > > > wrote: > > > > > > > Should we remove the hdfs module from Solr 10? Inspired by some of > the > > > > discussions in > > > > https://lists.apache.org/thread/lltc0wjdghq18tt37zlrsd8ty35qsytl > > around > > > > removing the hadoop-auth module, I think that this is a real > > possiblity. > > > > I found some earlier work in > > > > https://issues.apache.org/jira/browse/SOLR-14660 and > > > > https://issues.apache.org/jira/browse/SOLR-14021 that put us on the > > path > > > > for removing hdfs from Solr. > > > > > > > > If folks wanted to make it a third party package or modules for Solr, > > the > > > > code has been separated, so that should be much more feasible. > > > > I know a lot of folks are on holiday, so I won't make any commits > till > > > > January when folks are back in order to let folks voice their > opinons, > > > > however I may explore putting together a PR to remove it to see what > > that > > > > looks like... > > > > Also, this is an interesting read: > > > > https://cwiki.apache.org/confluence/display/SOLR/Deprecations > > > > Eric > > > > > > > > > > > > > > > > > >