gianm opened a new pull request, #15693:
URL: https://github.com/apache/druid/pull/15693

   These readers were running UTF-8 decode on the provided entity to convert it 
to a String, then parsing the String as JSON. The patch changes them to parse 
the provided entity's input stream directly.
   
   In order to preserve the nice error messages that include parse errors, the 
readers now need to open the entity again on the error path, to re-read the 
data. To make this possible, the InputEntity#open contract is tightened to 
require the ability to re-open entities, and existing InputEntity 
implementations are updated to allow re-opening.
   
   This patch also renames JsonLineReaderBenchmark to JsonInputFormatBenchmark, 
updates it to benchmark all three JSON readers, and adds a case that reads 
fields out of the parsed row (not just creates it).
   
   Benchmarks below. Findings:
   
   - The `reader` and `node_reader` (used if `useJsonNodeReader` is set) 
readers are ~15% faster on `parseAndRead`. `reader` is the default for stream 
ingest; `node_reader` is used for stream ingest if `useJsonNodeReader` is set.
   - The `line_reader` wasn't changed in this patch and performance is the same 
(within margin of error). This one is default for batch ingest, and used for 
streaming if `assumeNewlineDelimited` is set.
   
   So, the speedups are mainly for streaming ingest. But #15681 has a similar 
speedup for `line_reader` if that's the one you care about!
   
   ```
   master
   
   Benchmark                              (readerTypeString)  Mode  Cnt     
Score     Error  Units
   JsonInputFormatBenchmark.parseAndRead              reader  avgt    5  
3148.287 ± 117.748  us/op
   JsonInputFormatBenchmark.parseAndRead         node_reader  avgt    5  
3232.287 ±  20.667  us/op
   JsonInputFormatBenchmark.parseAndRead         line_reader  avgt    5  
3085.638 ±  45.131  us/op
   
   patch
   
   Benchmark                              (readerTypeString)  Mode  Cnt     
Score    Error  Units
   JsonInputFormatBenchmark.parseAndRead              reader  avgt    5  
2656.737 ± 65.348  us/op
   JsonInputFormatBenchmark.parseAndRead         node_reader  avgt    5  
2659.078 ± 53.231  us/op
   JsonInputFormatBenchmark.parseAndRead         line_reader  avgt    5  
3017.010 ± 63.724  us/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to