[jira] [Commented] (CASSANDRA-7970) JSON support for CQL

Sylvain Lebresne (JIRA) Mon, 23 Feb 2015 10:28:58 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333620#comment-14333620
 ]


Sylvain Lebresne commented on CASSANDRA-7970:
---------------------------------------------

bq. I attempted to improve the handling of collection serialization formats 
\[...\]. I have mixed feelings about the results.

Me too :). I do think some clarification around this can be done, but I'm not 
entirely sure having a special version for collection serialization is 
necessarily the best way to go. Overall, I'd prefer leaving such refactor out 
of this ticket.

Other remarks on the patch:
* For the textile doc, I'd split out a new JSON section and only link to that 
from the {{INSERT}} and {{SELECT}} doc. Documenting JSON in the middle of the 
statements feels slightly confusing, it would feel clearer to me to concentrate 
it in a separate section, one that also start by stating what JSON support is 
all about (i.e. really just a convenience for getting data from/to a JSON 
form).  Also, the doc is fully up to date with the patch (mention that 
{{VALUES}} for {{INSERT}} is optional when using JSON for instance) - 
{{cql3handling.py}} is not up to date on the {{INSERT}} syntax.
* When doing {{INSERT INTO foo JSON ?}}, I'd maybe use {{\[json\]}} as receiver 
name to be consistent with what we do for {{LIMIT}}, {{TIMESTAMP}} and {{TTL}}. 
Also, UpdateStatement.ParsedInsert.jsonReceiverSpec is unused. Same for 
{{SELECT}} (for consistency with {{\[applied\]}}).
* Should make {{FromJsonFct/toJsonFct}} extends {{NativeScalarFunction}} 
directly.
* It's not entirely this patch fault but I don't understand why we need 
{{FunctionName.equalsNativeFunction}} (why not just use equals)?  The only 
difference is that if {{this}} doesn't have a keyspace, it doesn't check the 
other name doesn't either, but that just feels wrong to me tbh.
* For UDT, I think we need to care about case sensitivity in {{fromJSONObject}}.
* In TupleType/UDTType.fromJSONObject, we should force the format to V3. It's 
also a tad weird to use CollectionSerializer.writeValue to write the UDT. Why 
not use the TupleValue.buildValue() function? Or my next suggestion.
* If we made {{fromJSONObject}} return a {{Term}}, we'd avoid serializing 
collection to re-deserialize them right away. We'd also leave the serialization 
of UDT/Tuple to UserTypes/TupleTypes.DelayedValues (doesn't hurt to concentrate 
it in one place as much as possible) and save a bunch of changes (in 
ColumnCondition, Lists, Sets and Maps).
* I'd prefer limiting the json pollution in {{Selection}}. We (mostly) really 
need to bother about JSON when writing the ResultSet, so we could simply pass 
the isJson to the {{resultSetBuilder}} method (somehow it makes more sense to 
me to call {{rowToJson}} in the builder anyway).  The one other place we need 
it is {{getResultMetadata}}, but we could also pass a {{isJson}} flag there and 
either build the metadata if it's json (not a big deal) or find a simple way to 
cache the json metadata per-table (since it's always the same thing except for 
the keyspace and table name).
* {{FromJsonFct}} should probably use the new {{ExecutionException}} from 
CASSANDRA-8528 rather than {{InvalidRequestException}}.
* For {{INSERT JSON}}, I think having a ParsedInsertJson separate from 
ParsedInsert makes things a bit cleaner. I also feel that Json.java could 
encapsulate a bit better the necessary specificity of {{INSERT JSON}}. As I'm 
not sure how to express clearly what I mean (and because I wasn't sure that 
what I had in mind would really work), I've push a commit at 
[here|https://github.com/pcmanus/cassandra/commits/7970] shows what I have in 
mind. It's not really different from the initial approach but it feels cleaner 
overall. How do you feel about it?
* Any particular reason for having {{DecimalType}} and {{IntegerType}} quote 
their values (they use the default {{toJSONString}}). I would have expected 
them to translate to JSON numbers since as far as I can tell JSON doesn't 
impose any size limit on its numbers.
* Maybe we could abstract slighty our use of jackson (put the helpers we need 
in Json.java maybe?), so that 1) we have only one place to change if we upgrade 
jackson and the API change (or we want to change of library) and 2) we save 
creating multiple ObjectMapper or JsonStringEncoder objects.

Also two minor nits:
* The patch makes ResultSet.COUNT_COLUMN public for no reason it seems.
* Out of curiosity, was there a profound reason to make Term.Terminal take the 
protocol version only (instead of the full QueryOptions)? If it's just because 
it's the only thing used, that's fine, but just fyi passing QueryOptions was 
done so we don't have to change everything again if a future Term needs 
something else than the protocol version.

> JSON support for CQL
> --------------------
>
>                 Key: CASSANDRA-7970
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7970
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Jonathan Ellis
>            Assignee: Tyler Hobbs
>              Labels: client-impacting, cql3.3, docs-impacting
>             Fix For: 3.0
>
>         Attachments: 7970-trunk-v1.txt
>
>
> JSON is popular enough that not supporting it is becoming a competitive 
> weakness.  We can add JSON support in a way that is compatible with our 
> performance goals by *mapping* JSON to an existing schema: one JSON documents 
> maps to one CQL row.
> Thus, it is NOT a goal to support schemaless documents, which is a misfeature 
> [1] [2] [3].  Rather, it is to allow a convenient way to easily turn a JSON 
> document from a service or a user into a CQL row, with all the validation 
> that entails.
> Since we are not looking to support schemaless documents, we will not be 
> adding a JSON data type (CASSANDRA-6833) a la postgresql.  Rather, we will 
> map the JSON to UDT, collections, and primitive CQL types.
> Here's how this might look:
> {code}
> CREATE TYPE address (
>   street text,
>   city text,
>   zip_code int,
>   phones set<text>
> );
> CREATE TABLE users (
>   id uuid PRIMARY KEY,
>   name text,
>   addresses map<text, address>
> );
> INSERT INTO users JSON
> {‘id’: 4b856557-7153,
>    ‘name’: ‘jbellis’,
>    ‘address’: {“home”: {“street”: “123 Cassandra Dr”,
>                         “city”: “Austin”,
>                         “zip_code”: 78747,
>                         “phones”: [2101234567]}}};
> SELECT JSON id, address FROM users;
> {code}
> (We would also want to_json and from_json functions to allow mapping a single 
> column's worth of data.  These would not require extra syntax.)
> [1] http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/
> [2] https://blog.compose.io/schema-less-is-usually-a-lie/
> [3] http://dl.acm.org/citation.cfm?id=2481247



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7970) JSON support for CQL

Reply via email to