[jira] [Comment Edited] (SOLR-16524) Index time hash partitioning

Joel Bernstein (Jira) Mon, 07 Nov 2022 07:24:04 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629872#comment-17629872
 ]


Joel Bernstein edited comment on SOLR-16524 at 11/7/22 3:23 PM:
----------------------------------------------------------------

BinaryDocValues could possibly achieve similar data access patterns. We'd need 
to dig into all the machinery in Lucene that goes into reading binary doc 
values. But I suspect there is a lot of overhead in the Lucene machinery 
dealing with compression and it depends at what level of abstraction the reads 
are being done. But it would be interesting to time how fast binary docvalues 
can be iterated. 


was (Author: joel.bernstein):
BinaryDocValues could possibly achieve similar data access patterns. We'd need 
to dig into all the machinery in Lucene that goes into reading binary doc 
values. But I suspect there is a lot of overhead in the Lucene machinery 
dealing with compression and it depends at what level of abstraction the reads 
are being done. But it would interesting to time how fast binary docvalues can 
be iterated. 

> Index time hash partitioning
> ----------------------------
>
>                 Key: SOLR-16524
>                 URL: https://issues.apache.org/jira/browse/SOLR-16524
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Priority: Major
>
> Both Streaming Expressions and Spark-Solr currently rely on query time hash 
> partitioning using the HashQParserPlugin. The query time hash partitioning, 
> although extremely flexible, is very slow when it builds its initial filters. 
> This ticket will add an indexing time hash partitioner that Streaming 
> Expressions and Spark-solr will both be able to use.
> When this ticket is complete I'll also update the ParallelStream and 
> Spark-Solr to be able to use the index time partitioning rather than the 
> HashQParserPlugin.
> This is a stepping stone towards much more performant parallel distributed 
> joins.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-16524) Index time hash partitioning

Reply via email to