FrankChen021 opened a new pull request #10980:
URL: https://github.com/apache/druid/pull/10980


   This PR fixes #10945
   
   ### Description
   
   /druid/v2 endpoint supports jackson smile encoding both for request and 
response. But there's a bug there when processing a request with 
'Accept:application/x-jackson-smile' header.
   
   Current implementation takes 'Accept' parameter, which indicates server that 
smile encoding is accepted by client to decode the HTTP response, to decode a 
JSON request. According to HTTP protocol on 'Content-Type' and 'Accept', server 
should alway decode the body according to 'Content-Type' header, while encode 
response by 'Accept' header( if not provided, 'Accept' defaults to 
'Content-Type')
   
   The reason why current unit tests fails to detect this bug is because test 
cases in `QueryResourceTest` wrongly inject a Json ObjectMapper instead of 
Smile ObjectMapper for Smile decoding. All related test cases are fixed in this 
PR too.
   
   ### Smile encoding test
   
   I have never heard of Smile encoding before, and there's little doc in Druid 
to describe if we should exploit it. So I also make some investigation how it 
saves response sizes in Druid to see if this feature could be used in our 
environment.
   
   ### Test data
   
   Wikipedia data shipped by Druid
   
   ### Tests
   
   Sending following scan query by CURL with following parameters
   - `limit` parameter range in [100,200,300,400,500]
   - Accept:application/x-jackson-smile turned on and off
   - Content-Encoding:gzip turned on and off
   
   observe 'size_download' for each response outputted by CURL 
   
   ```
   {
     "queryType": "scan",
     "dataSource": {
       "type": "table",
       "name": "wikipedia"
     },
     "intervals": {
       "type": "intervals",
       "intervals": [
         "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
       ]
     },
     "virtualColumns": [],
     "batchSize": 20480,
     "limit": 100,
     "legacy": false,
     "descending": false,
     "granularity": {
       "type": "all"
     }
   }
   ```
   
   #### Result
   
   Limit |Response Bytes in JSON(GZip) | Response Bytes in JSON(No-GZip) | 
Response Bytes in Smile(GZip) | Response Bytes in Smile(No-GZip)
   -- | -- | -- | -- | --
   100 | 6753 | 43084 | 6550 | 17874
   200 | 12485 | 86427 | 11646 | 35781
   300 | 18879 | 129798 | 17316 | 53797
   400 | 24494 | 172290 | 22208 | 70896
   500 | 30886 | 215331 | 27920 | 88595
   
   The tests above do not cover all queries in druid, but the data in the table 
generally reflects advantages of Smile encoding in one aspect.
   
   From the table we can see that:
   - when gzip is not enabled in response, Smile encoding(column 5) takes about 
only 41% space over JSON(column 3)
   - when gzip is enabled, Smile encoding performs a little improvement(column 
4 vs columns 2)
   
   In contrast to gzip which requires more resources on server side, Smile 
saves serialization time compared to JSON according to [some 
tests](http://www.theotherian.com/2013/12/kryo-vs-smile-vs-json-shootout-part-1.html).
 And there's no too much work to do expect replacing default object mapper to 
smile object mapper which usually is one line of work in many dependency 
injection system, I think Smile encoding is worthy to try.
   
   ### Note
   
   SQL endpoint does not support Smile encoding yet(#10931), this PR does not 
add Smile encoding to SQL endpoint. I think we could have a discussion. Based 
on the tests shown above, IMO, it's reasonable to provide such support because 
SQL query is more popular than native query, at least in our environment.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `ResourceIOWriter` is extracted from `ResourceIOReaderWriter` to hold 
response content-type and right ObjectMapper while original 
ResourceIOReaderWriter holds right ObjectMapper to deserialize input request.
   
   <hr>
   
   
   This PR has:
   - [X] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [X] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to