[jackson-user] Re: How to represent JSON to allow efficient manipulations

Burak Emre Kabakcı Wed, 27 Feb 2019 08:08:58 -0800

Although I didn't do any benchmarking, I don't think that there is a huge 
performance benefit of using ObjectNode over Map<String, Object> where 
Object can be one of mappable JSON value (Number, String, Boolean etc.) 
because there are already native serializers for List, Map and all 
primitive values so all Jackson should do is to find out the serializer for 
class type (Object.getClass()) which is pretty straightforward because 
Jackson caches the deserializers under the hood. Thanks to the caching, the 
hot path is often fast enough. In our workload, the bottleneck is usually 
IO.


Avro requires a schema but you do can define the schema at runtime. If you 
have something like a metadata store and also want to validate the JSON 
data, it might be more convenient but if you don't know anything about 
schema, it's not an option as I explained in my first message. In that 
case, I would use JsonNode. 

How do you want to manipulate the data? Are you going to discard some 
attributes, alter the existing ones or generate new JSON data from the 
existing one?

On Wednesday, February 27, 2019 at 3:58:42 PM UTC+3, [email protected] 
wrote:
>
> Thanks for the reply Burak.
>
> You are indeed right that JsonNode employs standard Java Map, List 
> objects. Though note that each JsonNode includes specialized 
> serialize(JsonGenerator, SerializerProvider) methods which I think way 
> better than reflection-based blind serialization. While this doesn't bring 
> any memory efficiency, it certainly yields better processing efficiency. 
> Another potential advantage I was considering for JsonNode is taking 
> advantage of its object pooling capabilities such as (in pseudo code)
>
> try (var sessionFactory = jsonNodeFactory.clone()) {
>     JsonNode oldNode = sessionFactory.read(jsonText);
>     JsonNode newNode = transform(sessionFactory, oldNode);
>     byte[] newNodeBytes = sessionFactory.write(newNode);
>     persist(newNodeBytes);
> }
>
> Though I am not sure if this is possible at all.
>
> You mentioned about using Avro to represent JSON in memory. To the best of 
> my knowledge, Avro requires a schema. Maybe I am missing something. Would 
> you mind elaborating a little bit on your idea here?
>
> Best.
>
> On Wednesday, February 27, 2019 at 12:48:07 PM UTC+1, Burak Emre Kabakcı 
> wrote:
>>
>> 1. JsonNode doesn't actually have any magic under the hood. It uses 
>> Map<String, JsonNode> for objects, List<JsonNode> for arrays and wrapper 
>> objects for primitive types. Therefore, it's not memory efficient IMO so it 
>> actually depends on your priorities:
>>    1.1. If you care about the code complexity compared to memory 
>> bottleneck or doesn't really have any schema, I would stick with JsonNode 
>> as it's feature-rich and well-documented.
>>    1.2. If you care about the performance and doesn't really have any 
>> option to use class data binding, you may write your custom parser and 
>> deserialize the JSON blob into a more compact representation such as Apache 
>> Avro rather than JsonNode. If you have a metadata store, you can basically 
>> parse the JSON blob, validate it and convert them into Avro instances which 
>> will occupy less heap memory. We prefer this approach as it also provides 
>> us a native way to validate the JSON blob.
>>
>> 2. AFAIK, JsonNodeFactory is thread-safe and stateless but it would be 
>> better if someone who has experience with it could answer this question.
>>
>> On Tuesday, February 26, 2019 at 11:48:13 PM UTC+3, [email protected] 
>> wrote:
>>>
>>> Hello,
>>>
>>> In a code base I have been working on for more than a year, we receive 
>>> JSON (as byte[] from queue), we transform JSON, and we persist JSON (to 
>>> Elasticsearch). In the initial design, tempted by its convenience, we made 
>>> the mistake of representing the JSON as java.lang.Object in this JSON 
>>> pipeline. Transformation functions receive Object and spit out Object. 
>>> Though this introduced other problems (e.g., certain types like Set<V> do 
>>> not map 1-to-1 to JSON, variable types do not properly communicate the 
>>> intent of the value, lost caching and object pooling opportunities, etc.) 
>>> revealed themselves in pretty late stages of the development. If I would 
>>> have been starting from scratch today, I would pick Jackson's JsonNode as 
>>> the only allowed JSON type. Though getting off the beaten path might raise 
>>> other issues that we cannot oversee right now and this is where my question 
>>> comes into play.
>>>
>>>    1. What are the best practices to represent JSON documents to allow 
>>>    efficient read/write/update operations? (Note that in our case there is 
>>> no 
>>>    schema, hence no class data binding.)
>>>    2. If one would use JsonNode, what are the best practices for using 
>>>    JsonNodeFactory'ies? A single global factory? A thread-local factory? No 
>>>    factory but explicit JsonNode::new calls?
>>>
>>> Thanks in advance.
>>> Best.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

[jackson-user] Re: How to represent JSON to allow efficient manipulations

Reply via email to