Solr already supports today reading and indexing on cloud storage - ABFS, GCS, and S3 - using the Hadoop HDFS module. I assume the same works with HDFS backup/restore as well. I haven't checked if all the supporting libraries are included in the shipped Solr distribution, but the HDFS filesystem support includes cloud storage. I can't attest to the performance, but last I heard it works.
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html https://hadoop.apache.org/docs/stable/hadoop-azure/index.html https://github.com/GoogleCloudDataproc/hadoop-connectors Kevin Risden On Mon, Apr 24, 2023 at 5:15 PM Joel Bernstein <joels...@gmail.com> wrote: > As far as a Lucene/Solr directory on cloud storage. Performance on the > write has a lot of overhead per file, hundreds of millis. The read overhead > is about half as much. I believe the write is so expensive due to the > strong consistency of both gcs and s3. So I think the main bottleneck would > be indexing and merging lots of small segments etc ... > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Fri, Apr 21, 2023 at 3:27 AM Ishan Chattopadhyaya < > ichattopadhy...@gmail.com> wrote: > > > My colleague at SearchScale has tried S3FS, and running Solr indexes off > > S3. We can chat about it, if you're interested. > > > > On Fri, 21 Apr, 2023, 10:38 am David Smiley, <dsmi...@apache.org> wrote: > > > > > Cool! > > > I wonder if anyone has tried such things for a Lucene/Solr "Directory" > as > > > well? > > > > > > ~ David Smiley > > > Apache Lucene/Solr Search Developer > > > http://www.linkedin.com/in/davidwsmiley > > > > > > > > > On Mon, Apr 17, 2023 at 1:14 PM Joel Bernstein <joels...@gmail.com> > > wrote: > > > > > > > I've been testing Java NIO providers for cloud storage. These two in > > > > particular worked for our use cases: > > > > > > > > https://github.com/googleapis/java-storage-nio > > > > https://github.com/carlspring/s3fs-nio > > > > > > > > I believe an Azure provider is available. > > > > > > > > We've been working on sponsoring getting the s3 provider into a > public > > > > maven repo and I can update this thread when that's done. > > > > > > > > > > > > > > > > Joel Bernstein > > > > http://joelsolr.blogspot.com/ > > > > > > > > > > > > On Mon, Apr 10, 2023 at 6:51 PM Ishan Chattopadhyaya < > > > > ichattopadhy...@gmail.com> wrote: > > > > > > > > > Oh thanks, Jan. I had missed it. It is a shame because it looks > like > > a > > > > very > > > > > neat project. > > > > > > > > > > On Mon, 10 Apr, 2023, 23:53 Jan Høydahl, <jan....@cominvent.com> > > > wrote: > > > > > > > > > > > Looks like a nice project. With the promise of low-hanging > support > > > for > > > > > > more providers than those three for free. > > > > > > > > > > > > However, > > > > > https://lists.apache.org/thread/w61gzk2ohjtshbwcb5gy6wb2htv7fo0x > > > > > > does not look promising - they plan to move the project to the > > Attic, > > > > and > > > > > > no new releases has happened during the 6 months since the > > > proposal... > > > > > > > > > > > > Jan > > > > > > > > > > > > > 10. apr. 2023 kl. 19:08 skrev Ishan Chattopadhyaya < > > > > > > ichattopadhy...@gmail.com>: > > > > > > > > > > > > > > I think we should deprecate both the modules for S3 and GCS, > and > > > > > > > adopt Apache JCloud project that supports all three. > > > > > > > > > > > > > > > > > > > > > > > > > > >