Solr already supports today reading and indexing on cloud storage - ABFS,
GCS, and S3 - using the Hadoop HDFS module. I assume the same works with
HDFS backup/restore as well. I haven't checked if all the supporting
libraries are included in the shipped Solr distribution, but the HDFS
filesystem support includes cloud storage. I can't attest to the
performance, but last I heard it works.

https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html
https://hadoop.apache.org/docs/stable/hadoop-azure/index.html
https://github.com/GoogleCloudDataproc/hadoop-connectors


Kevin Risden


On Mon, Apr 24, 2023 at 5:15 PM Joel Bernstein <joels...@gmail.com> wrote:

> As far as a Lucene/Solr directory on cloud storage. Performance on the
> write has a lot of overhead per file, hundreds of millis. The read overhead
> is about half as much. I believe the write is so expensive due to the
> strong consistency of both gcs and s3. So I think the main bottleneck would
> be indexing and merging lots of small segments etc ...
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Apr 21, 2023 at 3:27 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
> > My colleague at SearchScale has tried S3FS, and running Solr indexes off
> > S3. We can chat about it, if you're interested.
> >
> > On Fri, 21 Apr, 2023, 10:38 am David Smiley, <dsmi...@apache.org> wrote:
> >
> > > Cool!
> > > I wonder if anyone has tried such things for a Lucene/Solr "Directory"
> as
> > > well?
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> > >
> > > On Mon, Apr 17, 2023 at 1:14 PM Joel Bernstein <joels...@gmail.com>
> > wrote:
> > >
> > > > I've been testing Java NIO providers for cloud storage. These two in
> > > > particular worked for our use cases:
> > > >
> > > > https://github.com/googleapis/java-storage-nio
> > > > https://github.com/carlspring/s3fs-nio
> > > >
> > > > I believe an Azure provider is available.
> > > >
> > > > We've been working on sponsoring getting the s3 provider into a
> public
> > > > maven repo and I can update this thread when that's done.
> > > >
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > >
> > > > On Mon, Apr 10, 2023 at 6:51 PM Ishan Chattopadhyaya <
> > > > ichattopadhy...@gmail.com> wrote:
> > > >
> > > > > Oh thanks, Jan. I had missed it. It is a shame because it looks
> like
> > a
> > > > very
> > > > > neat project.
> > > > >
> > > > > On Mon, 10 Apr, 2023, 23:53 Jan Høydahl, <jan....@cominvent.com>
> > > wrote:
> > > > >
> > > > > > Looks like a nice project. With the promise of low-hanging
> support
> > > for
> > > > > > more providers than those three for free.
> > > > > >
> > > > > > However,
> > > > > https://lists.apache.org/thread/w61gzk2ohjtshbwcb5gy6wb2htv7fo0x
> > > > > > does not look promising - they plan to move the project to the
> > Attic,
> > > > and
> > > > > > no new releases has happened during the 6 months since the
> > > proposal...
> > > > > >
> > > > > > Jan
> > > > > >
> > > > > > > 10. apr. 2023 kl. 19:08 skrev Ishan Chattopadhyaya <
> > > > > > ichattopadhy...@gmail.com>:
> > > > > > >
> > > > > > > I think we should deprecate both the modules for S3 and GCS,
> and
> > > > > > > adopt Apache JCloud project that supports all three.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to