Hi,

After going through alternative data models to represent objects including
more optimized methods of lookup, it has been decided to go along with the
most basic model which is Option 1 as suggested by Preston and recorded in
the wiki[1]. This was because after getting things to work using the simple
method further optimization could be carried out.

[1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq

Thank you.
Riyafa

On 10 May 2016 at 10:59, Michael Carey <[email protected]> wrote:

> Sounds like a great plan!
>
>
>
> On 5/9/16 10:18 PM, Till Westmann wrote:
>
>>
>>
>> On 9 May 2016, at 12:02, Preston Carman wrote:
>>
>> I think we have three options: optimize for space, keys (jdm:keys) or
>>> field lookup (jdm:value). The optimization for keys and field lookup
>>> could be done independently. Lets consider the option currently in the
>>> wiki as option 1 (space). Don't remove this option from the wiki so we
>>> have a reference. The new options for keys and field lookup can be
>>> added as option 2 and 3.
>>>
>>> Option 1 (space): A tightly compact format that is optimized to save
>>> space.
>>> Option 2 (keys): A data model optimized for accessing a list of keys.
>>> Option 3 (lookup): A data model optimized for accessing a field in the
>>> object.
>>>
>>> For option 2 (keys):
>>> Consider the return value for jdm:keys: jdm:keys($o as object()) as
>>> xs:string*
>>> I am not sure I fully understand what xs:string* represents. Is this a
>>> sequence of string as in XQuery or an array in JSONiq or some other
>>> structure. The most optimal way to return the keys would be to store
>>> them in the same way they should be returned. This way you can do a
>>> simple copy to produce the result without processing the result. In
>>> this case, storing them as a sequence (or array) of string values
>>> might be the best option. The values would then need to be a separate
>>> sequence (or array) of typed values in the object data model. Pro:
>>> easy keys function. Con: added a list of offsets for the keys.
>>>
>>
>> xs:string* is indeed a sequence of strings
>>
>> For option 3 (lookup):
>>> This option is independent of option 2. As Till suggested we can
>>> implement this at a later date. We would need a method to improve the
>>> lookup of a field. Option 1 and 2 requires a sequential search of the
>>> keys and a string comparison at each field. The AsterixDB record data
>>> model is a little more complex than I first thought. Take a look a
>>> their record implementation: writing the record [1] (line 205 to 245
>>> are interesting) and field look up [2] (line 277 to 344) . We only
>>> need to consider the open part of the record. (The closed part can be
>>> ignored.)
>>>
>>
>> I had another idea for the implementation of the dictionary. We could
>> store the keys in sorted order - while we store the values in the original
>> order. If each key is then followed by the offset to the value, we would
>> get
>> a) a log n access for a value (as the keys are sorted and we can do binary
>>    search) and
>> b) the keys in their original order, if we sort them by the offsets.
>> Assuming that the value() access is quite a bit more common than the
>> keys() access this could be a reasonable trade-off.
>>
>> Comments?
>>>
>>
>> Sounds good to list the options on the Wiki page.
>>
>> Also, what is the actual result of jdm:keys?
>>>
>>
>> A sequence of strings.
>>
>> What is the requirement for the initial implementation?
>>>
>>
>> It should be correct and tested.
>>
>> My 2c,
>> Till
>>
>> [1]
>>> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/builders/RecordBuilder.java
>>> [2]
>>> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java
>>>
>>> On Mon, May 9, 2016 at 8:35 AM, Riyafa Abdul Hameed
>>> <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there any documentation I could go through to understand the
>>>> AsterixDB
>>>> Hash code implementation on the open fields? I am not sure I understand
>>>> enough from the AsterixDB serialization [1] to define the data model for
>>>> objects using it.
>>>>
>>>> Sorry about any confusion.
>>>>
>>>> [1]
>>>>
>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>
>>>> Thank you.
>>>> Riyafa
>>>>
>>>> On 9 May 2016 at 20:16, Michael J. Carey <[email protected]> wrote:
>>>>
>>>> I think Preston's suggestion of looking at the AsterixDB implementation
>>>>> of
>>>>> its binary data model is a good one, as it shares the efficient field
>>>>> access by name requirements and several VXQuery folks are experts in
>>>>> its
>>>>> details as well.  I believe it uses a sorted list instead of a hash
>>>>> table
>>>>> internally, perhaps - slightly simpler for updates perhaps.
>>>>> On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" <[email protected]
>>>>> >
>>>>> wrote:
>>>>>
>>>>> Hi again,
>>>>>
>>>>> I have been thinking of Till's suggestion of using a dictionary, and I
>>>>> think it would be a better alternative because then we wouldn't have to
>>>>> process the valuetag of the value of a particular key before moving to
>>>>> the
>>>>> next key. Hence it would be easy to implement jdm:keys method. Any
>>>>> suggestions? Shall I updated the wiki and the doc based on this.
>>>>>
>>>>> Thank you.
>>>>> Riyafa
>>>>>
>>>>> On 9 May 2016 at 19:21, Riyafa Abdul Hameed <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hi Till,
>>>>>>
>>>>>> Currently I have suggested storing each key followed by the value.
>>>>>> This
>>>>>> uses less space and is quite similar to storing the offset of the
>>>>>> values
>>>>>> and the access is also linear to the number of keys.
>>>>>>
>>>>>> Thanks.
>>>>>> Riyafa
>>>>>>
>>>>>> On 9 May 2016 at 18:54, Till Westmann <[email protected]> wrote:
>>>>>>
>>>>>> All of this looks pretty good!
>>>>>>>
>>>>>>> Wrt. the question of the dictionary for the fields, I think that we
>>>>>>>
>>>>>> should
>>>>>
>>>>>> consider the 2 ways that we can access an object:
>>>>>>> 1. Either we get all keys (jdm:keys) or
>>>>>>> 2. we get a value for a key (jdm:value).
>>>>>>>
>>>>>>> To get all the keys efficiently and to be able to skip huge nested
>>>>>>>
>>>>>> values
>>>>>
>>>>>> a
>>>>>>> simple approach could be store a dictionary of the keys (in their
>>>>>>>
>>>>>> original
>>>>>
>>>>>> order) with pointers (offsets) to the values. That way we could get
>>>>>>> the
>>>>>>> keys
>>>>>>> quickly by scanning the dictionary and each value by scanning the
>>>>>>> dictionary
>>>>>>> + 1 hop to find the value. This certainly has the problem, that the
>>>>>>>
>>>>>> access
>>>>>
>>>>>> is linear in the number of the keys. But it is reasonably simple and
>>>>>>> it
>>>>>>> would allow us to get a correct + testable implementation relatively
>>>>>>>
>>>>>> soon
>>>>>
>>>>>> and to have a baseline for a more optimized representation.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>
>>>>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>>>>>
>>>>>>
>>>>>>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>>>>>>>
>>>>>>> Hi Preston,
>>>>>>>
>>>>>>>>
>>>>>>>> I have edited the wiki[1] and the doc[2] based on the comments.
>>>>>>>> Thank
>>>>>>>>
>>>>>>> you
>>>>>
>>>>>> for the suggestions provided. I have removed the part that assigns an
>>>>>>>>
>>>>>>> id
>>>>>
>>>>>> to
>>>>>>>> the keys and instead suggested that the keys be stored in the order
>>>>>>>>
>>>>>>> they
>>>>>
>>>>>> appear in the json object. I am not sure I understand the concept of
>>>>>>>> hashcode--how to generate the hashcodes used for easy lookup?
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>> [2]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>
>>>>>>
>>>>>>>> Thank you again.
>>>>>>>>
>>>>>>>> Yours sincerely,
>>>>>>>> Riyafa
>>>>>>>>
>>>>>>>> On 9 May 2016 at 01:23, christina pavlopoulou <[email protected]>
>>>>>>>>
>>>>>>> wrote:
>>>>>
>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I updated the wiki page according to Preston's comments along with
>>>>>>>>> the
>>>>>>>>> json array example in [1].
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>
>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Christina
>>>>>>>>>
>>>>>>>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>>>>>>>
>>>>>>>>> Nice job guys. I can see you are picking up how to create a data
>>>>>>>>>
>>>>>>>>>> model. I have limited my comments to the wiki [1] for now. At a
>>>>>>>>>> high
>>>>>>>>>> level, I was impressed with your detail and thoughtful layouts. It
>>>>>>>>>> reminds me of the age old trade off: speed vs space. At this time,
>>>>>>>>>> lets error on saving space. The data model should the as compact
>>>>>>>>>> as
>>>>>>>>>> possible.
>>>>>>>>>>
>>>>>>>>>> I also found the AsterixDB serialization [2] we can use as a
>>>>>>>>>> reference. Even though the AsterixDB data model includes object
>>>>>>>>>> length, I would leave that out since all the XQuery data models do
>>>>>>>>>>
>>>>>>>>> not
>>>>>
>>>>>> include this property.
>>>>>>>>>>
>>>>>>>>>> Riyafa, take a look at the method AsterixDB uses for quick look
>>>>>>>>>> ups
>>>>>>>>>>
>>>>>>>>> (a
>>>>>
>>>>>> hash value for the name). Consider the pros and cons between your
>>>>>>>>>> method and AsterixDB's method: a list hash value for name and a
>>>>>>>>>>
>>>>>>>>> sorted
>>>>>
>>>>>> list of names.
>>>>>>>>>>
>>>>>>>>>> Also, take a look at my wiki comments. Its a great start!
>>>>>>>>>>
>>>>>>>>>> Mahalo,
>>>>>>>>>> Preston
>>>>>>>>>>
>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>>> [2]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>>
>>>>>>
>>>>>>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
>>>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I, also, designed an example for the json array [1] given the
>>>>>>>>>>> description I
>>>>>>>>>>> wrote in the wiki page.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>
>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Christina
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am attempting to create a doc on the JSONiq data model for
>>>>>>>>>>>> objects[1]
>>>>>>>>>>>> (It
>>>>>>>>>>>> might be full of errors because I am doing the calculations
>>>>>>>>>>>> manually).
>>>>>>>>>>>>
>>>>>>>>>>>> This is what I have come up on the data model for objects:
>>>>>>>>>>>>
>>>>>>>>>>>> The first byte would have the value tag, followed by the id (4
>>>>>>>>>>>> bytes) of
>>>>>>>>>>>> the object. Then 4 bytes to represent the size of the object.
>>>>>>>>>>>> Then
>>>>>>>>>>>> another
>>>>>>>>>>>> four bytes to represent the number of key-value pairs. Next few
>>>>>>>>>>>>
>>>>>>>>>>> bytes
>>>>>
>>>>>> represent the offsets of keys which follow (each offset is
>>>>>>>>>>>> represented
>>>>>>>>>>>> by
>>>>>>>>>>>> 4
>>>>>>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would
>>>>>>>>>>>> be
>>>>>>>>>>>>
>>>>>>>>>>> a
>>>>>
>>>>>> sorted
>>>>>>>>>>>> list of ids for keys in alphabetical order. The following bytes
>>>>>>>>>>>>
>>>>>>>>>>> would
>>>>>
>>>>>> represent the keys in the object.Each key is a StringPointable
>>>>>>>>>>>> followed
>>>>>>>>>>>> by
>>>>>>>>>>>> the id of the key. Each object would have a sequence pointable:
>>>>>>>>>>>> the
>>>>>>>>>>>> following bytes would be the number of Items (items are the
>>>>>>>>>>>> values
>>>>>>>>>>>> for
>>>>>>>>>>>> keys) in the sequence. The next bytes would be the offset of
>>>>>>>>>>>> each
>>>>>>>>>>>> item
>>>>>>>>>>>> in
>>>>>>>>>>>> the sequence. The last bytes would be the values for each key
>>>>>>>>>>>> followed
>>>>>>>>>>>> by
>>>>>>>>>>>> the respective id of the key.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope it makes sense.
>>>>>>>>>>>>
>>>>>>>>>>>> My problem is,
>>>>>>>>>>>>
>>>>>>>>>>>> I have not provided for the white spaces in the object. What
>>>>>>>>>>>> can I
>>>>>>>>>>>> use
>>>>>>>>>>>> to
>>>>>>>>>>>> represent the white spaces? I cannot use a text node because
>>>>>>>>>>>> object
>>>>>>>>>>>> is
>>>>>>>>>>>> not
>>>>>>>>>>>> a node.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>
>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>
>>>>>>
>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>
>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>> Riyafa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 26 April 2016 at 10:29, Preston Carman <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> We have two students working with us this summer through GSOC to
>>>>>>>>>>>>
>>>>>>>>>>>> complete
>>>>>>>>>>>>> JSONiq specification for arrays and objects. I think the first
>>>>>>>>>>>>>
>>>>>>>>>>>> step
>>>>>
>>>>>> is
>>>>>>>>>>>>> to
>>>>>>>>>>>>> define the data model used by JSONiq. The definition should be
>>>>>>>>>>>>> defined
>>>>>>>>>>>>> in
>>>>>>>>>>>>> our wiki [1] before coding starts this summer. The wiki will
>>>>>>>>>>>>> allow
>>>>>>>>>>>>> the
>>>>>>>>>>>>> community to discuss the JSON data model implementation in
>>>>>>>>>>>>>
>>>>>>>>>>>> VXQuery.
>>>>>
>>>>>>
>>>>>>>>>>>>> I updated the JSONiq wiki to help get the documentation
>>>>>>>>>>>>> started.
>>>>>>>>>>>>> Please
>>>>>>>>>>>>> fill in the JSON data model based on the examples seen on our
>>>>>>>>>>>>> website
>>>>>>>>>>>>> (links on the wiki page).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Post here if you have any questions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Riyafa Abdul Hameed
>>>>>>>> Undergraduate, University of Moratuwa
>>>>>>>>
>>>>>>>> Email: [email protected]
>>>>>>>> Website: https://riyafa.wordpress.com/ <
>>>>>>>> http://riyafa.wordpress.com/>
>>>>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>>>>>> <http://twitter.com/Riyafa1>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Riyafa Abdul Hameed
>>>>>> Undergraduate, University of Moratuwa
>>>>>>
>>>>>> Email: [email protected]
>>>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>>>> <http://twitter.com/Riyafa1>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Riyafa Abdul Hameed
>>>>> Undergraduate, University of Moratuwa
>>>>>
>>>>> Email: [email protected]
>>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>>> <http://twitter.com/Riyafa1>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Riyafa Abdul Hameed
>>>> Undergraduate, University of Moratuwa
>>>>
>>>> Email: [email protected]
>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>> <http://twitter.com/Riyafa1>
>>>>
>>>
>


-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Reply via email to