[
https://issues.apache.org/jira/browse/SOLR-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629880#comment-17629880
]
David Smiley commented on SOLR-16524:
-------------------------------------
You are talking about the overhead docValues ordinals. But wouldn't a
pre-computed hash be numeric?
Beware doing this exotic thing will have code maintenance costs. Maintaining
new files synchronized with the index in some way is not easy. That dev effort
will be paid by you now and the project henceforth. Personally, I'm -0 because
it's not yet evident to me how the requirements cannot be met with index
sorting.
> Index time hash partitioning
> ----------------------------
>
> Key: SOLR-16524
> URL: https://issues.apache.org/jira/browse/SOLR-16524
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Joel Bernstein
> Priority: Major
>
> Both Streaming Expressions and Spark-Solr currently rely on query time hash
> partitioning using the HashQParserPlugin. The query time hash partitioning,
> although extremely flexible, is very slow when it builds its initial filters.
> This ticket will add an indexing time hash partitioner that Streaming
> Expressions and Spark-solr will both be able to use.
> When this ticket is complete I'll also update the ParallelStream and
> Spark-Solr to be able to use the index time partitioning rather than the
> HashQParserPlugin.
> This is a stepping stone towards much more performant parallel distributed
> joins.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]