[
https://issues.apache.org/jira/browse/SOLR-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gerlowski resolved SOLR-16697.
------------------------------------
Fix Version/s: 9.3
Resolution: Fixed
Alright! Following a bit of fiddling with some flaky test issues, it looks
like they're happy with this change. I backported to 9.x this morning (so
it'll be in the 9.3 release), and I'm going to close this out for now.
If anyone gets a chance to try this out, I'd love feedback on the functionality
and interface. We've got some flexibility to change things if anyone has
feedback prior to this going out the door in 9.3!
> New API support to import index files generated by Embedded SOLR into SOLR
> Cloud
> --------------------------------------------------------------------------------
>
> Key: SOLR-16697
> URL: https://issues.apache.org/jira/browse/SOLR-16697
> Project: Solr
> Issue Type: New Feature
> Components: Backup/Restore
> Reporter: Indumathy Rajagopalan
> Assignee: Jason Gerlowski
> Priority: Major
> Fix For: 9.3
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Offline indexing is a popular option when really large data sets needs to be
> indexed into SOLR.
> Data is loaded from data source ( eg. c*) and index creation pipelines
> produce index files per shard using embedded SOLR.
>
> With older versions of SOLR, we would copy these index files into SOLR Cloud
> data directories using a custom tools and reload the collection to be able to
> search/update on the newly uploaded collection.
> Ideally, we should use the Restore API to import the index files from backup
> repository. However, the file structure expected for the Restore API to work
> is complex enough that massaging the index files in every shard into Restore
> compatible format is infeasible.
>
> It would be good for SOLR to support a 'Restore' like API that would allow us
> to import index files generated by embedded SOLR into SOLR Cloud ? This API
> should operate on shard level and be able to import the index files into a
> single shard (per invocation)
>
> *With the new API , offline indexing could look like this :*
>
> 1. Generate index files per shard using embedded SOLR as a part of hadoop MR
> /Spark jobs and copy all index files for every shard into backup repository.
>
> 2. The New API should be able to import the index from backup repository
> location into each shard on SOLR Cloud. The API would handle things like
> marking the collection as read-only, trigger replication etc. along the lines
> of what the 'RESTORE' API currently does.
>
> The new API should be able to support relevant parameters from Restore API (
> location & repository )
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]