[
https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333620#comment-14333620
]
Sylvain Lebresne commented on CASSANDRA-7970:
---------------------------------------------
bq. I attempted to improve the handling of collection serialization formats
\[...\]. I have mixed feelings about the results.
Me too :). I do think some clarification around this can be done, but I'm not
entirely sure having a special version for collection serialization is
necessarily the best way to go. Overall, I'd prefer leaving such refactor out
of this ticket.
Other remarks on the patch:
* For the textile doc, I'd split out a new JSON section and only link to that
from the {{INSERT}} and {{SELECT}} doc. Documenting JSON in the middle of the
statements feels slightly confusing, it would feel clearer to me to concentrate
it in a separate section, one that also start by stating what JSON support is
all about (i.e. really just a convenience for getting data from/to a JSON
form). Also, the doc is fully up to date with the patch (mention that
{{VALUES}} for {{INSERT}} is optional when using JSON for instance) -
{{cql3handling.py}} is not up to date on the {{INSERT}} syntax.
* When doing {{INSERT INTO foo JSON ?}}, I'd maybe use {{\[json\]}} as receiver
name to be consistent with what we do for {{LIMIT}}, {{TIMESTAMP}} and {{TTL}}.
Also, UpdateStatement.ParsedInsert.jsonReceiverSpec is unused. Same for
{{SELECT}} (for consistency with {{\[applied\]}}).
* Should make {{FromJsonFct/toJsonFct}} extends {{NativeScalarFunction}}
directly.
* It's not entirely this patch fault but I don't understand why we need
{{FunctionName.equalsNativeFunction}} (why not just use equals)? The only
difference is that if {{this}} doesn't have a keyspace, it doesn't check the
other name doesn't either, but that just feels wrong to me tbh.
* For UDT, I think we need to care about case sensitivity in {{fromJSONObject}}.
* In TupleType/UDTType.fromJSONObject, we should force the format to V3. It's
also a tad weird to use CollectionSerializer.writeValue to write the UDT. Why
not use the TupleValue.buildValue() function? Or my next suggestion.
* If we made {{fromJSONObject}} return a {{Term}}, we'd avoid serializing
collection to re-deserialize them right away. We'd also leave the serialization
of UDT/Tuple to UserTypes/TupleTypes.DelayedValues (doesn't hurt to concentrate
it in one place as much as possible) and save a bunch of changes (in
ColumnCondition, Lists, Sets and Maps).
* I'd prefer limiting the json pollution in {{Selection}}. We (mostly) really
need to bother about JSON when writing the ResultSet, so we could simply pass
the isJson to the {{resultSetBuilder}} method (somehow it makes more sense to
me to call {{rowToJson}} in the builder anyway). The one other place we need
it is {{getResultMetadata}}, but we could also pass a {{isJson}} flag there and
either build the metadata if it's json (not a big deal) or find a simple way to
cache the json metadata per-table (since it's always the same thing except for
the keyspace and table name).
* {{FromJsonFct}} should probably use the new {{ExecutionException}} from
CASSANDRA-8528 rather than {{InvalidRequestException}}.
* For {{INSERT JSON}}, I think having a ParsedInsertJson separate from
ParsedInsert makes things a bit cleaner. I also feel that Json.java could
encapsulate a bit better the necessary specificity of {{INSERT JSON}}. As I'm
not sure how to express clearly what I mean (and because I wasn't sure that
what I had in mind would really work), I've push a commit at
[here|https://github.com/pcmanus/cassandra/commits/7970] shows what I have in
mind. It's not really different from the initial approach but it feels cleaner
overall. How do you feel about it?
* Any particular reason for having {{DecimalType}} and {{IntegerType}} quote
their values (they use the default {{toJSONString}}). I would have expected
them to translate to JSON numbers since as far as I can tell JSON doesn't
impose any size limit on its numbers.
* Maybe we could abstract slighty our use of jackson (put the helpers we need
in Json.java maybe?), so that 1) we have only one place to change if we upgrade
jackson and the API change (or we want to change of library) and 2) we save
creating multiple ObjectMapper or JsonStringEncoder objects.
Also two minor nits:
* The patch makes ResultSet.COUNT_COLUMN public for no reason it seems.
* Out of curiosity, was there a profound reason to make Term.Terminal take the
protocol version only (instead of the full QueryOptions)? If it's just because
it's the only thing used, that's fine, but just fyi passing QueryOptions was
done so we don't have to change everything again if a future Term needs
something else than the protocol version.
> JSON support for CQL
> --------------------
>
> Key: CASSANDRA-7970
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7970
> Project: Cassandra
> Issue Type: New Feature
> Components: API
> Reporter: Jonathan Ellis
> Assignee: Tyler Hobbs
> Labels: client-impacting, cql3.3, docs-impacting
> Fix For: 3.0
>
> Attachments: 7970-trunk-v1.txt
>
>
> JSON is popular enough that not supporting it is becoming a competitive
> weakness. We can add JSON support in a way that is compatible with our
> performance goals by *mapping* JSON to an existing schema: one JSON documents
> maps to one CQL row.
> Thus, it is NOT a goal to support schemaless documents, which is a misfeature
> [1] [2] [3]. Rather, it is to allow a convenient way to easily turn a JSON
> document from a service or a user into a CQL row, with all the validation
> that entails.
> Since we are not looking to support schemaless documents, we will not be
> adding a JSON data type (CASSANDRA-6833) a la postgresql. Rather, we will
> map the JSON to UDT, collections, and primitive CQL types.
> Here's how this might look:
> {code}
> CREATE TYPE address (
> street text,
> city text,
> zip_code int,
> phones set<text>
> );
> CREATE TABLE users (
> id uuid PRIMARY KEY,
> name text,
> addresses map<text, address>
> );
> INSERT INTO users JSON
> {‘id’: 4b856557-7153,
> ‘name’: ‘jbellis’,
> ‘address’: {“home”: {“street”: “123 Cassandra Dr”,
> “city”: “Austin”,
> “zip_code”: 78747,
> “phones”: [2101234567]}}};
> SELECT JSON id, address FROM users;
> {code}
> (We would also want to_json and from_json functions to allow mapping a single
> column's worth of data. These would not require extra syntax.)
> [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/
> [2] https://blog.compose.io/schema-less-is-usually-a-lie/
> [3] http://dl.acm.org/citation.cfm?id=2481247
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)