Dawid Weiss commented on SOLR-12094:

I understand the concept of "streaming" imports, but this just seems wrong to 
me here. An analogy here would be XSLT or other technologies where the 
implementation permits efficient "streaming" mode in certain cases, unless the 
input makes it impossible. 

I perceive a similar situation here: the parser should be able to handle the 
input efficiently if possible, but should also give the possibility for 
processing any type of input, even such that cannot be processed without 
bookkeeping of some history. Sure, an abuse case of millions of split nodes 
awaiting a single attribute is possible, but even then it'd be simpler to just 
say "yeah, buffer up until you can emit the output" than modify the structure 
of such a json (write a converter so that the nested nodes are always placed at 
the end of the parent).

[~awislowski] do you think you'd be able to modify the patch so that it accepts 
an argument and switches between the 'strict streaming' mode and 'relaxed' 
mode? In 'strict streaming' mode there should be no buffering and the parser 
should complain with an exception if it encounters extra nodes after the split. 
In the 'relaxed mode' the parser should buffer up the information until it's 
complete and can be emitted.

> JsonRecordReader ignores root record fields after the split point
> -----------------------------------------------------------------
>                 Key: SOLR-12094
>                 URL: https://issues.apache.org/jira/browse/SOLR-12094
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: master (8.0)
>            Reporter: Przemysław Szeremiota
>            Priority: Major
>         Attachments: SOLR-12094.patch, SOLR-12094.patch, 
> json-record-reader-bug.patch
> JsonRecordReader, when configured with other than top-level split, ignores 
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
>     {
>         "subject": "Maths",
>         "test": "term1",
>         "marks": 90
>     },
>     {
>         "subject": "Biology",
>         "test": "term1",
>         "marks": 86
>     }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from 
> /update/json/docs.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to