Hi Simo,
Take this with several grains of salt as I don't know the internals of the
feature processing, but just looking at your email from a generic "how do I
process a JSON file" it still seems inefficient.

Ideally, IMO, the substitution would be done as a filter applied to the
stream of parser events. That way the entire String is not held in memory
-- only the parsed DOM. I suspect it is also "safer" in the sense that you
can more tightly control the context in which interpolation occurs (for
example, interpolation should be allowed in string values, but not keys);
the flip side is that it also is more restrictive, i.e. supporting
interpolation of non-String values would be non-trivial (then again, doing
this would make the original document invalid JSON so I'm not sure this is
a real use case). I would suggest taking a look at Jackson's
JsonParserDelegate.

Regards,
Justin

On Mon, Nov 12, 2018 at 2:46 PM Simone Tripodi <simonetrip...@apache.org>
wrote:

> Hi all mates,
>
> during the last couple of months the work we've been doing on Feature
> files processing is HUGE, so the iterations to refine the pipeline
> process introduced some "overhead" operations we can improve, what we
> currently do is:
>
>  * the pre processor starts by reading the whole file to memory,
> storing it in a String reference;
>  * parse the JSON file to create the javax.json DOM and check the `id`
> property is missing, adding it if necessary and then serializing it to
> string again:
>  * JSON Schema validation takes the string as input, creates the
> Jackson DOM to validate it against the defined schema;
>  * if schema validation is OK, the Substitution takes the JSON string
> as input to interpolate variables, which creates a new JSON string
> representation;
>  * the JS Min takes the JSON string representation and converts it to
> a new JSON string representation where useless stuff are removed;
>  * at that point, the JSON Feature reader takes the final string and
> creates a javax.json DOM once again to map it to a Feature instance.
>
> My proposal is improving a little our pipeline in order to speed up
> the JSON processing in that way:
>
>  * the JS Min starts by reading the whole file to memory, storing it
> in a String reference;
>  * the Substitution takes the JSON string as input to interpolate
> variables, which creates a new JSON string representation;
>  * a Jackson DOM will be created in order to check the `id` property
> is missing, adding it if necessary;
>  * the Jackson DOM will be validated against the defined schema;
>  * the Jackson DOM will be mapped to a Feature instance.
>
> WDYT?
>
> Many thanks in advance!
> ~Simo
>
> http://people.apache.org/~simonetripodi/
> http://twitter.com/simonetripodi
>

Reply via email to