[jira] [Updated] (SOLR-16524) Index time hash partitioning

Joel Bernstein (Jira) Fri, 04 Nov 2022 13:38:06 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-16524:
----------------------------------
    Description: 
Both Streaming Expressions and Spark-Solr currently rely on query time hash 
partitioning using the HashQParserPlugin. The query time hash partitioning, 
although extremely flexible, is very slow when it build its initial filters. 

This ticket will add an indexing time hash partitioner that Streaming 
Expressions and Spark-solr will both be able to use.

When this ticket is complete I'll also update the ParallelStream and Spark-Solr 
to be able to use the index time partitioning rather than the HashQParserPlugin.

This is a stepping stone towards much more performant parallel distributed 
joins.

  was:
Both Streaming Expressions and Spark-Solr currently rely on the query time hash 
partitioning using the HashQParserPlugin. The query time hash partitioning, 
although extremely flexible, is also very slow when it build its initial 
filters. 

This ticket will add an indexing time hash partitioner that Streaming 
Expressions and Spark-solr will both be able to use.

When this ticket is complete I'll also update the ParallelStream and Spark-Solr 
to be able to use the index time partitioning rather than the HashQParserPlugin.

This is a stepping stone towards much more performant parallel distributed 
joins.


> Index time hash partitioning
> ----------------------------
>
>                 Key: SOLR-16524
>                 URL: https://issues.apache.org/jira/browse/SOLR-16524
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Priority: Major
>
> Both Streaming Expressions and Spark-Solr currently rely on query time hash 
> partitioning using the HashQParserPlugin. The query time hash partitioning, 
> although extremely flexible, is very slow when it build its initial filters. 
> This ticket will add an indexing time hash partitioner that Streaming 
> Expressions and Spark-solr will both be able to use.
> When this ticket is complete I'll also update the ParallelStream and 
> Spark-Solr to be able to use the index time partitioning rather than the 
> HashQParserPlugin.
> This is a stepping stone towards much more performant parallel distributed 
> joins.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-16524) Index time hash partitioning

Reply via email to