Viraj Jasani created PHOENIX-7330:
-------------------------------------

             Summary: Introducing Binary JSON (BSON) with Complex Document 
structures in Phoenix
                 Key: PHOENIX-7330
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7330
             Project: Phoenix
          Issue Type: New Feature
            Reporter: Viraj Jasani


The purpose of this Jira is to introduce new data type in Phoenix: Binary JSON 
(BSON) to manage more complex document data structures in Phoenix.

BSON or Binary JSON is a Binary-Encoded serialization of JSON-like documents. 
BSON data type is specifically used for users to store, update and query part 
or whole of the BsonDocument in the most performant way without having to 
serialize/deserialize the document to/from binary format. Bson allows 
deserializing only part of the nested documents such that querying or indexing 
any attributes within the nested structure becomes more efficient and 
performant as the deserialization happens at runtime. Any other document 
structure would require deserializing the binary into the document, and then 
perform the query.

BSONSpec: [https://bsonspec.org/]

JSON and BSON are closely related by design. BSON serves as a binary 
representation of JSON data, tailored with specialized extensions for wider 
application scenarios, and finely tuned for efficient data storage and 
traversal. Similar to JSON, BSON facilitates the embedding of objects and 
arrays.

 

One particular way in which BSON differs from JSON is in its support for some 
more advanced data types. For instance, JSON does not differentiate between 
integers (round numbers), and floating-point numbers (with decimal precision). 
BSON does distinguish between the two and store them in the corresponding BSON 
data type (e.g. BsonInt32 vs BsonDouble). Many server-side programming 
languages offer advanced numeric data types (standards include integer, regular 
precision floating point number i.e. “float”, double-precision floating point 
i.e. “double”, and boolean values), each with its own optimal usage for 
efficient mathematical operations.

Another key distinction between BSON and JSON is that BSON documents have the 
capability to include Date or Binary objects, which cannot be directly 
represented in pure JSON format. BSON also provides the ability to store and 
retrieve user defined Binary objects. Likewise, by integrating advanced data 
structures like Sets into BSON documents, we can significantly enhance the 
capabilities of Phoenix for storing, retrieving, and updating Binary, Sets, 
Lists, and Documents as nested or complex data types.

Moreover, JSON format is human as well as machine readable, whereas BSON format 
is only machine readable. Hence, as part of introducing BSON data type, we also 
need to provide a user interface such that users can provide human readable 
JSON as input for BSON datatype.

This Jira also introduces access and update functions for BSON documents.

BSON_CONDITION_EXPRESSION can evaluate condition expression on the document 
fields, similar to how WHERE clause evaluates condition expression on various 
columns of the given row(s) for the relational tables.

BSON_UPDATE_EXPRESSION can perform one or more document field updates similar 
to how UPSERT statements can perform update to one or more columns of the given 
row(s) for the relational tables.

Overall, by combining various functionalities available in Phoenix like 
secondary indexes, conditional updates, high throughput read/write with BSON, 
we can evolve Phoenix into highly scalable Document Database.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to