Re: [jackson-user] Re: How to represent JSON to allow efficient manipulations

Tatu Saloranta Fri, 01 Mar 2019 21:31:43 -0800

On Wed, Feb 27, 2019 at 4:58 AM <[email protected]> wrote:
>
> Thanks for the reply Burak.
>
> You are indeed right that JsonNode employs standard Java Map, List objects. 
> Though note that each JsonNode includes specialized serialize(JsonGenerator, 
> SerializerProvider) methods which I think way better than reflection-based 
> blind serialization. While this >
 doesn't bring any memory efficiency, it certainly yields better
processing efficiency. > Another


For what it is worth, various processing models ("untyped" Objects,
JsonNode "trees") are benchmarked f.ex here:

https://github.com/FasterXML/jackson-benchmarks

and specifically

https://github.com/FasterXML/jackson-benchmarks/blob/master/results-pojo-2.9-home.txt

shows results.

Results for reading/writing speeds between two approaches:

c.f.j.p.json.JsonStdReadVanilla.readUntypedMediaItem thrpt 100
409050.888 ± 3215.011 ops/s
c.f.j.p.json.JsonStdReadVanilla.readNodeMediaItem thrpt 100 408543.442
± 4742.736 ops/s

c.f.j.p.json.JsonStdWriteVanilla.writeUntypedMediaItem thrpt 100
743788.893 ± 5046.760 ops/s
c.f.j.p.json.JsonStdWriteVanilla.writeNodeMediaItem thrpt 100
806293.628 ± 4913.396 ops/s

suggests that reading is essentially as fast in both cases; and for
writing `JsonNode` is marginally faster (5-10%) for test payload.

> potential advantage I was considering for JsonNode is taking advantage of its 
> object pooling capabilities such as (in pseudo code)
>
> try (var sessionFactory = jsonNodeFactory.clone()) {
>     JsonNode oldNode = sessionFactory.read(jsonText);
>     JsonNode newNode = transform(sessionFactory, oldNode);
>     byte[] newNodeBytes = sessionFactory.write(newNode);
>     persist(newNodeBytes);
> }
>
> Though I am not sure if this is possible at all.

`JsonNodeFactory` is indeed thread-safe and does no object pooling:
the only reuse is for singleton values of `true`, `false` and `null`
(with `NullNode`).

As to memory use `JsonNode` has additional wrapper object for
`Number`s and `String`s compared to `Object` approach but that's
probably irrelevant for most usage.

I think the main question to me would be convenience: for traversal
and modifications `JsonNode` is superior to dealing with `Map`s and
`List`s, especially when using null-safe "path()" and "at()" methods,
with which you can safely traverse paths (results include
`MissongNode` if there's no value at specified location.
And with `at()` you can use `JsonPointer` which is both convenient and
efficient (you can reuse thread-safe pointer instances, although even
creating one from String is quite well optimized).

> You mentioned about using Avro to represent JSON in memory. To the best of my 
> knowledge, Avro requires a schema. Maybe I am missing something. Would you 
> mind elaborating a little bit on your idea here?
>
> Best.

-+ Tatu +-

>
> On Wednesday, February 27, 2019 at 12:48:07 PM UTC+1, Burak Emre Kabakcı 
> wrote:
>>
>> 1. JsonNode doesn't actually have any magic under the hood. It uses 
>> Map<String, JsonNode> for objects, List<JsonNode> for arrays and wrapper 
>> objects for primitive types. Therefore, it's not memory efficient IMO so it 
>> actually depends on your priorities:
>>    1.1. If you care about the code complexity compared to memory bottleneck 
>> or doesn't really have any schema, I would stick with JsonNode as it's 
>> feature-rich and well-documented.
>>    1.2. If you care about the performance and doesn't really have any option 
>> to use class data binding, you may write your custom parser and deserialize 
>> the JSON blob into a more compact representation such as Apache Avro rather 
>> than JsonNode. If you have a metadata store, you can basically parse the 
>> JSON blob, validate it and convert them into Avro instances which will 
>> occupy less heap memory. We prefer this approach as it also provides us a 
>> native way to validate the JSON blob.
>>
>> 2. AFAIK, JsonNodeFactory is thread-safe and stateless but it would be 
>> better if someone who has experience with it could answer this question.
>>
>> On Tuesday, February 26, 2019 at 11:48:13 PM UTC+3, [email protected] 
>> wrote:
>>>
>>> Hello,
>>>
>>> In a code base I have been working on for more than a year, we receive JSON 
>>> (as byte[] from queue), we transform JSON, and we persist JSON (to 
>>> Elasticsearch). In the initial design, tempted by its convenience, we made 
>>> the mistake of representing the JSON as java.lang.Object in this JSON 
>>> pipeline. Transformation functions receive Object and spit out Object. 
>>> Though this introduced other problems (e.g., certain types like Set<V> do 
>>> not map 1-to-1 to JSON, variable types do not properly communicate the 
>>> intent of the value, lost caching and object pooling opportunities, etc.) 
>>> revealed themselves in pretty late stages of the development. If I would 
>>> have been starting from scratch today, I would pick Jackson's JsonNode as 
>>> the only allowed JSON type. Though getting off the beaten path might raise 
>>> other issues that we cannot oversee right now and this is where my question 
>>> comes into play.
>>>
>>> What are the best practices to represent JSON documents to allow efficient 
>>> read/write/update operations? (Note that in our case there is no schema, 
>>> hence no class data binding.)
>>> If one would use JsonNode, what are the best practices for using 
>>> JsonNodeFactory'ies? A single global factory? A thread-local factory? No 
>>> factory but explicit JsonNode::new calls?
>>>
>>> Thanks in advance.
>>> Best.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "jackson-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [jackson-user] Re: How to represent JSON to allow efficient manipulations

Reply via email to