[jira] [Commented] (SOLR-16524) Index time hash partitioning

Joel Bernstein (Jira) Sat, 05 Nov 2022 09:50:28 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629356#comment-17629356
 ]


Joel Bernstein commented on SOLR-16524:
---------------------------------------

The big pain point is getting the bytes from the index. Another way to 
alleviate this pain point is to have a listener that extracts specific fields 
on commit to binary files on disk. The format of these files would be:

- 4 byte lucene id
- 1 byte data length
- N bytes data

The HashQParserPlugin could then rip through these files and build the hash's 
rather than extracting the bytes from the index.



> Index time hash partitioning
> ----------------------------
>
>                 Key: SOLR-16524
>                 URL: https://issues.apache.org/jira/browse/SOLR-16524
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Priority: Major
>
> Both Streaming Expressions and Spark-Solr currently rely on query time hash 
> partitioning using the HashQParserPlugin. The query time hash partitioning, 
> although extremely flexible, is very slow when it builds its initial filters. 
> This ticket will add an indexing time hash partitioner that Streaming 
> Expressions and Spark-solr will both be able to use.
> When this ticket is complete I'll also update the ParallelStream and 
> Spark-Solr to be able to use the index time partitioning rather than the 
> HashQParserPlugin.
> This is a stepping stone towards much more performant parallel distributed 
> joins.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-16524) Index time hash partitioning

Reply via email to