yashmayya commented on PR #13097:
URL: https://github.com/apache/pinot/pull/13097#issuecomment-2099750401

   > I'm not sure if this is the correct behavior.
   JSON_FORMAT should be able to convert any object into JSON. If the value 
itself is abc, then the proper JSON version should be "abc".
   
   That makes sense, I hadn't considered that.
   
   > If user wants to generate another level of json over a JSON string, we 
should still allow that, and the new JSON is also valid
   
   Are there any valid use cases for that?
   
   > I don't fully follow the example in the description. If the value is 
already JSON string, user shouldn't call JSON_FORMAT again
   
   Yeah, I agree in principle but this leads to the confusing behavior that I 
tried to document in the PR description. Let me try again below.
   
   <hr>
   
   <h2> Scenario 1</h2>
   
   - There is a table with two columns in its schema - `data` (JSON) and 
`alias` (STRING). `alias` is a derived column using the following ingestion 
transform function: `JSON_PATH_STRING(JSON_FORMAT(data), '$.alias')`.
   - A sample value is: 
`{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}`.
   - This works as expected and the above value is ingested fine, with the 
`alias` column being set to `student1`.
   - In this case, the `JSON_FORMAT` function is called on the `Map` value 
representing `data` during ingestion, which results in the JSON string and that 
works fine with the `JSON_PATH_STRING` function.
   
   
   <h2> Scenario 2</h2>
   
   - There is a table with one column in its schema - `data` (JSON). A sample 
value that has already been ingested is 
`{"data":{"alias":"student1","age":24,"name":{"full.name":"Peter1","nick.name":"Pete1"}}}`.
   - Now, the user wants to add a new derived column `alias` with the same 
logic as the above scenario. Now, the user would expect that the same ingestion 
transform function should work here as well with segment reload.
   - If the user updates the table config to add the ingestion transform 
function: `JSON_PATH_STRING(JSON_FORMAT(data), '$.alias')` (and also updates 
the schema to add the new `alias` column), the segment reload fails with an NPE 
[here](https://github.com/apache/pinot/blob/070e3dbd81f8223e5ad3872a3b8ae15db87bc543/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java#L592).
   - The reason is that in this case, since the data has already been ingested 
and stored in a segment, the value that is passed to the `JSON_FORMAT` function 
in this case is a String (rather than a `Map` as in the above scenario) since 
JSON columns are stored as strings internally. This results in an escaped 
string value that is treated as a literal JSON string value by the 
`JSON_PATH_STRING` function thus resulting in an unexpected `null` value being 
returned.
   
   <hr>
   
   Is this working as expected and documented somewhere? Or should we solve 
this issue in a different way to what this PR was attempting?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to