Hi.

I updated my report in the Wiki[1]. Also, I pushed my last commits to my
branch [2]. Please give it a look if you have time.

This week, I will give a look to the map reduce tests for DataStores.

Please let me know if you have suggestions.

[1]
https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
[2] https://github.com/jhnmora000/gora/tree/GORA-485

Thanks,
John

El sáb., 13 jul. 2019 a las 19:31, John Mora (<jhnmora...@gmail.com>)
escribió:

> Hi all
>
> I updated my report in the Wiki[1]. Also, I pushed my last commits to my
> branch [2]. Please give it a look if you have time.
>
> This week, I will be working in the getPartitions and deleteByQuery
> methods and testing the other tests in the DataStoreTestBase class.
>
> Please let me know if you have suggestions.
>
> [1]
> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>
> Best,
> John.
>
> El mié., 10 jul. 2019 a las 16:17, John Mora (<jhnmora...@gmail.com>)
> escribió:
>
>> Hi Alfonso,
>>
>> Thanks so much for your time and support for this project. I will work on
>> your comments. Responses inline :)
>>
>>
>> El mar., 9 jul. 2019 a las 16:38, Alfonso Nishikawa (<
>> alfonso.nishik...@gmail.com>) escribió:
>>
>>> Hi, John.
>>>
>>> Sorry for the delay, I am changing work and I have been very busy :( I
>>> will try to answer your questions :)
>>>
>>> *> In the Employee example there is a field called 'dateOfBirth'. I
>>> tried to map that field with the UNIXTIME_MICROS datatype of Kudu (I
>>> intuitively assumed this is a date.). However, in the java world the
>>> Employee field is a Long value and the kudu datatype is a Timestamp. So, I
>>> was wondering whether I should force the usage of the UNIXTIME_MICROS
>>> datatype for this field or just use a LONG datatype in Kudu.*
>>>
>>> In Avro 1.8 were introduced "Logical Types" so there is a "date" type
>>> with an underlying "int" [1]. It's the first time I read about because
>>> until the last version upgrade of Avro this weren't there. I would suggest
>>> to ignore "dates" and map dateOfBirth as long, since in any case -in avro-
>>> the value is the unix epoch. After this first approach, a design
>>> improvement would be great, though :)
>>>
>>> - Would be good to have in the mapping a "timestamp" type so KuduStore
>>> converts between the Entity long field <-> Kudu timestamp storage?
>>> - Is there any other approach?
>>>
>>
>> I think that Entity long field <-> Kudu timestamp conversion that the
>> best alternative right now. Because, I would add more compatible datatypes
>> to the mapping parameters which users can use. And this conversion should
>> not be dificult to implement in my opinion. Also, the new Date datatype of
>> avro could be implemented in newer versions because it would need further
>> analysis in other datastores too. I will work on that.
>>
>>
>>>
>>>
>>> *> What is the Gora's policy regarding flush()? *
>>> *> KuduClient has multiple flushing modes
>>> <https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html>and
>>> also can set time interval
>>> <https://kudu.apache.org/releases/1.2.0/apidocs/org/apache/kudu/client/KuduSession.html#setFlushInterval-int->
>>> for automatic flush.*
>>> *> Should theses behaviors be configurable using gora.properties file?
>>> or just use the default configurations.*
>>>
>>> What we do in HBase is configure an autoflush option in gora.properties
>>> [2] which is used when instanced the Table, but at the same time we
>>> implement the flush() method to force the flush [3]. I would suggest to
>>> follow that example, but adding the flushing options of Kudu. What flushing
>>> mode (and time interval if it applies) do you suggest?
>>>
>>
>> Well,  IMHO the default flush mode (auto flush sync) will do the job for
>> most use cases. But I will add a configuration in gora.properties for
>> selecting the other modes and specifying a autoflush time  if needed  by
>> the user.
>>
>>
>>>
>>> *> Also, while reviewing the datastore interface I noticed this method
>>> 'getPartitions(Query<K, T> query)'. What is the expected behavior of this
>>> method?, should I use the partition definition in the xml mapping file for
>>> this?.*
>>>
>>> The method getPartitions(Query) is related to Hadoop. Apache Gora
>>> integrates with Hadoop implementing a custom Map and Reduce that allows to
>>> get/write Entities directly.
>>> You can take a look at HBase's implementation [4], which relies 
>>> o.a.h.hbase.mapreduce.TableInputFormatBase
>>> [5] to compute the splits (start key---end key) with the location of the
>>> split to create a colection of partitions [6].
>>>
>>> So, if Kudu is allowed to perform computation using local kudu splits,
>>> then this method does the needed preparation to allow to "send the
>>> computation to where the data is locally".
>>>
>>> In any case, you can see that:
>>>
>>>    - MongoDB store implementation does not implement splitting [7]
>>>    - Cassandra store implementation does not implement splitting [8]
>>>    - Aerospike store implementation does not implement splitting [9]
>>>    - Accumulo store implementation* does* implement splitting [10]
>>>
>>> If Kudu has a method to get the different splits for a table and its
>>> locations, then you will be able to implement the full feature.
>>>
>>> This is Hadoop related and it is not trivial. I haven't elaborated much,
>>> so if you find you need more information let me know :)
>>>
>>>
>>>
>> I will check whether Kudu has these features in order to implement this
>> method. If not I will use the default implementation found in other
>> backends.
>>
>>
>>> About Queries, what I can tell is that Hbase only implements "Start key"
>>> + "End key" because it has only 2 operations: "get" and "scan", and the
>>> querying is for "scan" operation, were you want an interval (or all) of the
>>> rows. Does Kudu have more querying functionality?
>>>
>>>
>> Yes, Kudu implements a Scanner for querying data among with conditional
>> predicates for filtering. I am using those classes.
>>
>>
>>> About other topic, I am trying to install Kudu in standalone (all in 1
>>> node). Do you use a Cloudera installation or do you have a standalone
>>> installation? How do you do it? I found some instructions, but they talk
>>> about compiling Kudu [11]. I was looking for something like HBase, that it
>>> is unzip + execute "hbase start".
>>>
>>>
>> I am using an embedded mini-cluster which comes with compiled binaries
>> and can be used with maven[1] for testing my code. Once I get it mature
>> enough I think I will be testing the datastore with a docker container [2].
>> I could not find a unzip+execute bundle either and I am kinda noob for
>> compiling it myself.
>>
>> [1]
>> https://kudu.apache.org/docs/developing.html#_jvm_based_integration_testing
>> [2] https://hub.docker.com/r/usuresearch/apache-kudu/
>>
>>
>>> Good job and thank you!! :)
>>>
>>> Regards,
>>>
>>> Alfonso Nishikawa
>>>
>>>
>>> [1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types
>>> [2] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L175
>>> [3] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L458
>>> [4] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L472
>>> [5] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L479
>>> [6] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L517
>>> [7] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-mongodb/src/main/java/org/apache/gora/mongodb/store/MongoStore.java#L533
>>> [8] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java#L292
>>> [9] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-aerospike/src/main/java/org/apache/gora/aerospike/store/AerospikeStore.java#L369
>>> [10] -
>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-accumulo/src/main/java/org/apache/gora/accumulo/store/AccumuloStore.java#L902
>>> [11] - https://kudu.apache.org/docs/installation.html
>>>
>>>
>>> El lun., 8 jul. 2019 a las 3:42, John Mora (<jhnmora...@gmail.com>)
>>> escribió:
>>>
>>>> Hi all.
>>>>
>>>> As every week I updated my report in the Wiki[1]. Also, I pushed my
>>>> last commits to my branch [2]. Please give it a look if you have time.
>>>>
>>>> This week, I will be continue working in the Queries implementation,
>>>> please reach me out if you have any suggestions.
>>>>
>>>> Also, while reviewing the datastore interface I noticed this method
>>>> 'getPartitions(Query<K, T> query)'. What is the expected behavior of this
>>>> method?, should I use the partition definition in the xml mapping file for
>>>> this?.
>>>>
>>>> Cheers,
>>>> John.
>>>>
>>>> [1]
>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>
>>>>
>>>> El dom., 30 jun. 2019 a las 16:56, John Mora (<jhnmora...@gmail.com>)
>>>> escribió:
>>>>
>>>>> Hi all.
>>>>>
>>>>> I received my first evaluation from the Google Summer of Code program
>>>>> with a positive result. Thanks so much for your support and confidence to
>>>>> the project and me.
>>>>>
>>>>> I updated my report of this week in the Wiki[1]. Also, I pushed my
>>>>> last commits to my branch [2].
>>>>>
>>>>> This week, I will be reviewing my the serialization/ deserialization
>>>>> process in order to identify optimizations specific for Kudu. Because I
>>>>> used a generic methods of other backends which probably could be better
>>>>> tuned for kudu. Also, I will start working on the Queries implementation.
>>>>>
>>>>> BTW, I added a question to the wiki about Date types. Please give it a
>>>>> look if you have time.
>>>>>
>>>>> [1]
>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>
>>>>> Cheers,
>>>>> John
>>>>>
>>>>> El jue., 27 jun. 2019 a las 21:02, John Mora (<jhnmora...@gmail.com>)
>>>>> escribió:
>>>>>
>>>>>> Hi Carlos.
>>>>>>
>>>>>> Thanks for the reminder. I submitted the form yesterday. :D
>>>>>>
>>>>>> Best,
>>>>>> John.
>>>>>>
>>>>>> El jue., 27 jun. 2019 a las 17:34, carlos muñoz (<
>>>>>> carlosr...@gmail.com>) escribió:
>>>>>>
>>>>>>> Hi John
>>>>>>>
>>>>>>> The first Google Summer of Code evaluation is due on June 28th.
>>>>>>> Please make sure you submit your Mentors' evaluation on time.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Carlos
>>>>>>>
>>>>>>> El dom., 23 jun. 2019 a las 18:29, John Mora (<jhnmora...@gmail.com>)
>>>>>>> escribió:
>>>>>>>
>>>>>>>> Hi all.
>>>>>>>>
>>>>>>>> FYI, I updated my report of this week on the Wiki[1]. Also, I
>>>>>>>> pushed my last commits to my branch [2].
>>>>>>>>
>>>>>>>> As I mentioned in the reports I would like to know how datastores
>>>>>>>> deal with flush(), should it work always manually executed?.
>>>>>>>>
>>>>>>>> Finally, This week I will be implementing object
>>>>>>>> serialization/deserialization in the methods put, get, delete, exists. 
>>>>>>>> Do
>>>>>>>> you have any suggestions on how to proceed with this task?.
>>>>>>>>
>>>>>>>> Footnote: Thanks for the feedback Carlos, I fixed the problem.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> El lun., 17 jun. 2019 a las 22:58, carlos muñoz (<
>>>>>>>> carlosr...@gmail.com>) escribió:
>>>>>>>>
>>>>>>>>> Hi John
>>>>>>>>>
>>>>>>>>> Your last changes look good to me. Keep it up. But, I noticed that
>>>>>>>>> you have created an Enumeration for datatypes, which is very similar 
>>>>>>>>> to the
>>>>>>>>> kudu-client's [2]. Probably you should replace [1] for [2] in order to
>>>>>>>>> avoid code duplication.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/Column.java#L76
>>>>>>>>> [2] https://kudu.apache.org/apidocs/org/apache/kudu/Type.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Carlos
>>>>>>>>>
>>>>>>>>> El sáb., 15 jun. 2019 a las 12:01, John Mora (<
>>>>>>>>> jhnmora...@gmail.com>) escribió:
>>>>>>>>>
>>>>>>>>>> Hi all.
>>>>>>>>>>
>>>>>>>>>> I updated my report of this week on the Wiki[1]. I noticed that
>>>>>>>>>> my code is lacking some javadoc documentation I think I will be 
>>>>>>>>>> working on
>>>>>>>>>> that this week, also I would like to enable and check schema 
>>>>>>>>>> management
>>>>>>>>>> tests (createSchema, existsSchema, etc.).
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> John.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> El mar., 11 jun. 2019 a las 0:11, John Mora (<
>>>>>>>>>> jhnmora...@gmail.com>) escribió:
>>>>>>>>>>
>>>>>>>>>>> Hi Alfonso.
>>>>>>>>>>>
>>>>>>>>>>> Thanks so much for your feedback. I am working on your comments.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>> El lun., 10 jun. 2019 a las 16:11, Alfonso Nishikawa (<
>>>>>>>>>>> alfonso.nishik...@gmail.com>) escribió:
>>>>>>>>>>>
>>>>>>>>>>>> Hi, John.
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding your questions at the report [1]:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - How to represent partitioning configurations on the
>>>>>>>>>>>>    mapping file.
>>>>>>>>>>>>
>>>>>>>>>>>> This was discussed in other emails, isn't it? :)
>>>>>>>>>>>>
>>>>>>>>>>>>    - KuduTestHarness requires the Maven plugin
>>>>>>>>>>>>    os-maven-plugin, which needs Maven 3.1.1+, is it a problem for 
>>>>>>>>>>>> Apache Gora?
>>>>>>>>>>>>
>>>>>>>>>>>> I believe it is not a problem. My Ubuntu comes with 3.6.0, far
>>>>>>>>>>>> from 3.1.1, and I assume everyone uses Maven 3 in a quite new 
>>>>>>>>>>>> version :)
>>>>>>>>>>>>
>>>>>>>>>>>> [1] -
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Alfonso Nishikawa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> El lun., 10 jun. 2019 a las 21:07, Alfonso Nishikawa (<
>>>>>>>>>>>> alfonso.nishik...@gmail.com>) escribió:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi, John.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>> Things I have seen:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - The version of a maven dependency [1] should go on the
>>>>>>>>>>>>> Dependency Management of the root pom [2]. Same for [3] and from 
>>>>>>>>>>>>> there,
>>>>>>>>>>>>> should not set the version there.
>>>>>>>>>>>>> - Set test dependencies' scope to test, at [4] and from there.
>>>>>>>>>>>>> - Set the indentation to 2 spaces for the pom [5]
>>>>>>>>>>>>> - Missing "t" in "localhost" at [6].
>>>>>>>>>>>>> - Port 13 for Kudu? That is "Daytime Protocol" RFC 867 and you
>>>>>>>>>>>>> will need root permission to run it. The default port for kudu is 
>>>>>>>>>>>>> 7051,
>>>>>>>>>>>>> isn't it?
>>>>>>>>>>>>> - I would ask you to add the same functionality to load the
>>>>>>>>>>>>> mapping from configuration as in HBase's store [7] in you 
>>>>>>>>>>>>> KuduStore [8].
>>>>>>>>>>>>> This will have implications on your readMapping at [9], so take a 
>>>>>>>>>>>>> look at
>>>>>>>>>>>>> the one for HBase at [10]
>>>>>>>>>>>>> - I know it is in other backends, but avoid RuntimeExceptions
>>>>>>>>>>>>> (at least in Java since we have the checked ones) like in [11]. 
>>>>>>>>>>>>> You can
>>>>>>>>>>>>> wrap them in GoraException. An example is [12]
>>>>>>>>>>>>>
>>>>>>>>>>>>> And nothing more :)
>>>>>>>>>>>>> Keep going, good job.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L98
>>>>>>>>>>>>> [2] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/pom.xml#L890
>>>>>>>>>>>>> [3] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L121
>>>>>>>>>>>>> [4] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L180
>>>>>>>>>>>>> [5] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml
>>>>>>>>>>>>> [6] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/test/resources/gora.properties#L18
>>>>>>>>>>>>> [7] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92
>>>>>>>>>>>>> [8] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L53
>>>>>>>>>>>>> [9] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L81
>>>>>>>>>>>>> [10] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L822
>>>>>>>>>>>>> [11] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L141
>>>>>>>>>>>>> [12] -
>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L268
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alfonso Nishikawa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> El sáb., 8 jun. 2019 a las 20:26, John Mora (<
>>>>>>>>>>>>> jhnmora...@gmail.com>) escribió:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have just updated my weekly reports on Cwiki [1]. This next
>>>>>>>>>>>>>> week I think I should be focusing on the create schema operation 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> solving the issue of the partitioning configurations in the 
>>>>>>>>>>>>>> mapping file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please let me know if you have suggestions, my last commits
>>>>>>>>>>>>>> are available here [2]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Reply via email to