> On March 19, 2015, 7:40 a.m., Aman Sinha wrote: > > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java, > > line 36 > > <https://reviews.apache.org/r/32223/diff/1/?file=899462#file899462line36> > > > > The default JsonReader (which is used when skip-all is false) has a > > initial while loop to iterate over the tokens; is that not needed here > > because you are expecting to be either at end-of-stream or at the beginning > > of a record ? I am wondering what happens where a single large record (with > > either many fields or a large string field) spans across batch boundary. (I > > am actually not completely sure if that is allowed, so let me know if that > > situation is not going to occur). > > Hanifi Gunes wrote: > i) I am not sure why this initial loop in the original reader is useful. > > ii) I think parser works on entire json stream across batch boundaries. > Wide records used to be a problem before auto reallocation came in now we do > re-alloc as needed. Besides since we are not particularly interested in > fields and just counting, footprint should be small. > > Aman Sinha wrote: > It would be good to confirm this by working with QA to have a test case > but it is not a blocker for the patch.
I think we have coverage on wide records. We can ask folks to include a count(*) query on the same data. - Hanifi ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32223/#review77027 ----------------------------------------------------------- On March 19, 2015, 5:49 p.m., Hanifi Gunes wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/32223/ > ----------------------------------------------------------- > > (Updated March 19, 2015, 5:49 p.m.) > > > Review request for drill, Aman Sinha and Parth Chandra. > > > Bugs: DRILL-2193 > https://issues.apache.org/jira/browse/DRILL-2193 > > > Repository: drill-git > > > Description > ------- > > DRILL-2193: implement fast count / skip-all semantics for JSON reader > > This patch introduces an abstraction for JSON processing and implements a > efficient counting JSON reader if query is in skip-all mode(see DRILL-2358). > > > Diffs > ----- > > > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java > c343177a719b5f36f51bcb2f84d68518ba1ae02f > > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonProcessor.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/BaseJsonProcessor.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/reader/CountingJsonReader.java > PRE-CREATION > > exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java > cc5c8af63c6383eb8d2e28a409a3c055bf5cc737 > > Diff: https://reviews.apache.org/r/32223/diff/ > > > Testing > ------- > > unit + regression > > > Thanks, > > Hanifi Gunes > >
