[
https://issues.apache.org/jira/browse/SOLR-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629356#comment-17629356
]
Joel Bernstein commented on SOLR-16524:
---------------------------------------
The big pain point is getting the bytes from the index. Another way to
alleviate this pain point is to have a listener that extracts specific fields
on commit to binary files on disk. The format of these files would be:
- 4 byte lucene id
- 1 byte data length
- N bytes data
The HashQParserPlugin could then rip through these files and build the hash's
rather than extracting the bytes from the index.
> Index time hash partitioning
> ----------------------------
>
> Key: SOLR-16524
> URL: https://issues.apache.org/jira/browse/SOLR-16524
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Joel Bernstein
> Priority: Major
>
> Both Streaming Expressions and Spark-Solr currently rely on query time hash
> partitioning using the HashQParserPlugin. The query time hash partitioning,
> although extremely flexible, is very slow when it builds its initial filters.
> This ticket will add an indexing time hash partitioner that Streaming
> Expressions and Spark-solr will both be able to use.
> When this ticket is complete I'll also update the ParallelStream and
> Spark-Solr to be able to use the index time partitioning rather than the
> HashQParserPlugin.
> This is a stepping stone towards much more performant parallel distributed
> joins.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]