[
https://issues.apache.org/jira/browse/HBASE-20748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charles PORROT updated HBASE-20748:
-----------------------------------
Description:
The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark.HBaseContext_
use the system's current time for the version of the cells to bulk-load.
This makes this method, and its twin _bulkLoadThinRows_, useless if you need to
use your own versionning system.
Thus, I propose a third _bulkLoad_ method, based on the original method.
Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis
for the writes, this new method would use an _Iterator(KeyFamilyQualifier,
Array[Byte], Long_), with the _Long_ being the version.
In case of illogical version (for instance, a negative version), the method
would throw back to the current timestamp.
See the attached file for a proposal of this new _bulkLoad_ method.
was:
The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark_ use the
system's current time for the version of the cells to bulk-load.
This makes this method, and its twin _bulkLoadThinRows_, useless if you need to
use your own versionning system.
Thus, I propose a third _bulkLoad_ method, based on the original method.
Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis
for the writes, this new method would use an _Iterator(KeyFamilyQualifier,
Array[Byte], Long_), with the _Long_ being the version.
In case of illogical version (for instance, a negative version), the method
would throw back to the current timestamp.
See the attached file for a proposal of this new _bulkLoad_ method.
> HBaseContext bulkLoad: being able to use custom versions
> --------------------------------------------------------
>
> Key: HBASE-20748
> URL: https://issues.apache.org/jira/browse/HBASE-20748
> Project: HBase
> Issue Type: Improvement
> Components: spark
> Reporter: Charles PORROT
> Priority: Major
> Labels: HBaseContext, bulkload, spark, versions
> Attachments: bulkLoadCustomVersions.scala
>
>
> The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark.HBaseContext_
> use the system's current time for the version of the cells to bulk-load.
> This makes this method, and its twin _bulkLoadThinRows_, useless if you need
> to use your own versionning system.
> Thus, I propose a third _bulkLoad_ method, based on the original method.
> Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis
> for the writes, this new method would use an _Iterator(KeyFamilyQualifier,
> Array[Byte], Long_), with the _Long_ being the version.
> In case of illogical version (for instance, a negative version), the method
> would throw back to the current timestamp.
> See the attached file for a proposal of this new _bulkLoad_ method.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)