[
https://issues.apache.org/jira/browse/SOLR-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705640#comment-17705640
]
Jason Gerlowski commented on SOLR-16697:
----------------------------------------
I've merged this to main and plan on backporting soon. But I noticed that over
the weekend there were a few test failures for S3InstallShardTest, a new test
added in this commit.
AFAICT those failures are from a OOM in the test JVM.
{code}
2> Caused by: java.lang.OutOfMemoryError: Java heap space
2> at
org.apache.solr.s3.S3BackupRepository.copyIndexFileTo(S3BackupRepository.java:348)
~[main/:?]
2> at
org.apache.solr.core.TrackingBackupRepository.copyIndexFileTo(TrackingBackupRepository.java:149)
~[solr-test-framework-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT
81fe0045aaa63b115808a8c3d87ce96dc7921e8b [snapshot build, details omitted]]
2> at
org.apache.solr.core.backup.repository.BackupRepository.copyFileTo(BackupRepository.java:191)
~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT
81fe0045aaa63b115808a8c3d87ce96dc7921e8b [snapshot build, details omitted]]
2> at
org.apache.solr.handler.RestoreCore$BasicRestoreRepository.repoCopy(RestoreCore.java:242)
~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT
81fe0045aaa63b115808a8c3d87ce96dc7921e8b [snapshot build, details omitted]]
2> at org.apache.solr.handler.RestoreCore.doRestore(RestoreCore.java:132)
~[solr-core-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT
81fe0045aaa63b115808a8c3d87ce96dc7921e8b [snapshot build, details omitted]]
{code}
It might just be a fluke, but it's also possible that S3InstallShardTest is a
bad citizen memory-wise. Anyway, I plan to give this a few more days to see if
the failure recurs before backporting.
> New API support to import index files generated by Embedded SOLR into SOLR
> Cloud
> --------------------------------------------------------------------------------
>
> Key: SOLR-16697
> URL: https://issues.apache.org/jira/browse/SOLR-16697
> Project: Solr
> Issue Type: New Feature
> Components: Backup/Restore
> Reporter: Indumathy Rajagopalan
> Assignee: Jason Gerlowski
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Offline indexing is a popular option when really large data sets needs to be
> indexed into SOLR.
> Data is loaded from data source ( eg. c*) and index creation pipelines
> produce index files per shard using embedded SOLR.
>
> With older versions of SOLR, we would copy these index files into SOLR Cloud
> data directories using a custom tools and reload the collection to be able to
> search/update on the newly uploaded collection.
> Ideally, we should use the Restore API to import the index files from backup
> repository. However, the file structure expected for the Restore API to work
> is complex enough that massaging the index files in every shard into Restore
> compatible format is infeasible.
>
> It would be good for SOLR to support a 'Restore' like API that would allow us
> to import index files generated by embedded SOLR into SOLR Cloud ? This API
> should operate on shard level and be able to import the index files into a
> single shard (per invocation)
>
> *With the new API , offline indexing could look like this :*
>
> 1. Generate index files per shard using embedded SOLR as a part of hadoop MR
> /Spark jobs and copy all index files for every shard into backup repository.
>
> 2. The New API should be able to import the index from backup repository
> location into each shard on SOLR Cloud. The API would handle things like
> marking the collection as read-only, trigger replication etc. along the lines
> of what the 'RESTORE' API currently does.
>
> The new API should be able to support relevant parameters from Restore API (
> location & repository )
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]