[
https://issues.apache.org/jira/browse/SOLR-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson updated SOLR-9166:
---------------------------------
Attachment: SOLR-9166.patch
[~rohitcse] I had some time today so started this patch.
What I have so far. I got it this far and ran into a few things I thought I'd
run by folks. Lots of nocommits and the like currently, as well as new failing
tests. But it's progress....
[[email protected]] [~joel.bernstein] [~dpgove] I'd be particularly interested
in your takes.
1> My base assumption is that sorting during export should return docs in the
same order as using the /select handler. Currently this doesn't happen, the new
test I wrote fails all over the place. Not quite sure why, but I just got all
this to semi-work so I'm checkpointing.
2> I want to fold the two parameters into a single on/off
returnDefaultsForMissing which defaults to "false". This would mean there's
really no way to get the old behavior where numerics return zero and strings
return null. Is that OK? I think it's easier to explain something like
"defaults for numerics are zero, default for string is "", default for boolean
is "false" and default for date is in 1970". But see <4>.
3> Does it make any sense to support sortMissingFirst/Last? My initial take is
"no" since what matters is consistent sorting. That said I started down that
road before wondering whether it was desirable so this patch has
sortMissingFirstLast in the test, it'll be removed unless there are objections.
4> [[email protected]]: Your comment about using functions is interesting. I'll
take a look at that now that I have a clue what the problem is. It's certainly
more elegant than some new flag I think and allows the user to put anything at
all in rather than us deciding what a "proper" default is. Do you have any
advice on how to access the defined default for the fields in
SortingResponseWriter since that's where I need to trap this? (being lazy here).
5> I @Ignored all the rest of the tests except the new one to be able to beast
the new stuff, they'll be un-ignored before committing.
6> Despite my comment on the dev list, after looking into this I don't think we
want to force it into 6.3, I think there'll be some ramifications we'll need to
bake out.
No doubt more later when we get some advice on how to continue.
> Export handler returns zero for numeric fields that are not in the original
> doc
> -------------------------------------------------------------------------------
>
> Key: SOLR-9166
> URL: https://issues.apache.org/jira/browse/SOLR-9166
> Project: Solr
> Issue Type: Bug
> Reporter: Erick Erickson
> Assignee: Rohit
> Attachments: SOLR-9166.patch, SOLR-9166.patch
>
>
> From the dev list discussion:
> My original post.
> Zero is different from not
> existing. And let's claim that I want to process a stream and, say,
> facet on in integer field over the result set. There's no way on the
> client side to distinguish between a document that has a zero in the
> field and one that didn't have the field in the first place so I'll
> over-count the zero bucket.
> From Dennis Gove:
> Is this true for non-numeric fields as well? I agree that this seems like a
> very bad thing.
> I can't imagine that a fix would cause a problem with Streaming Expressions,
> ParallelSQL, or other given that the /select handler is not returning 0 for
> these missing fields (the /select handler is the default handler for the
> Streaming API so if nulls were a problem I imagine we'd have already seen
> it).
> That said, within Streaming Expressions there is a select(...) function which
> supports a replace(...) operation which allows you to replace one value (or
> null) with some other value. If a 0 were necessary one could use a
> select(...) to replace null with 0 using an expression like this
> select(<stream>, replace(fieldA, null, withValue=0)).
> The end result of that would be that the field fieldA would never have a null
> value and for all tuples where a null value existed it would be replaced with
> 0.
> Details on the select function can be found at
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61330338#StreamingExpressions-select.
> And to answer Denis' question, null gets returned for string DocValues fields.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]