[ https://issues.apache.org/jira/browse/HBASE-20748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles PORROT updated HBASE-20748: ----------------------------------- Description: The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark.HBaseContext_ use the system's current time for the version of the cells to bulk-load. This makes this method, and its twin _bulkLoadThinRows_, useless if you need to use your own versionning system: {code:java} //Here is where we finally iterate through the data in this partition of the //RDD that has been sorted and partitioned it.foreach{ case (keyFamilyQualifier, cellValue:Array[Byte]) => val wl = writeValueToHFile( keyFamilyQualifier.rowKey, keyFamilyQualifier.family, keyFamilyQualifier.qualifier, cellValue, nowTimeStamp, fs, conn, localTableName, conf, familyHFileWriteOptionsMapInternal, hfileCompression, writerMap, stagingDir) ...{code} Thus, I propose a third _bulkLoad_ method, based on the original method. Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis for the writes, this new method would use an _Iterator(KeyFamilyQualifier, Array[Byte], Long_), with the _Long_ being the version. In case of illogical version (for instance, a negative version), the method would throw back to the current timestamp. See the attached file for a proposal of this new _bulkLoad_ method. was: The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark.HBaseContext_ use the system's current time for the version of the cells to bulk-load. This makes this method, and its twin _bulkLoadThinRows_, useless if you need to use your own versionning system. Thus, I propose a third _bulkLoad_ method, based on the original method. Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis for the writes, this new method would use an _Iterator(KeyFamilyQualifier, Array[Byte], Long_), with the _Long_ being the version. In case of illogical version (for instance, a negative version), the method would throw back to the current timestamp. See the attached file for a proposal of this new _bulkLoad_ method. > HBaseContext bulkLoad: being able to use custom versions > -------------------------------------------------------- > > Key: HBASE-20748 > URL: https://issues.apache.org/jira/browse/HBASE-20748 > Project: HBase > Issue Type: Improvement > Components: spark > Reporter: Charles PORROT > Priority: Major > Labels: HBaseContext, bulkload, spark, versions > Attachments: bulkLoadCustomVersions.scala > > > The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark.HBaseContext_ > use the system's current time for the version of the cells to bulk-load. This > makes this method, and its twin _bulkLoadThinRows_, useless if you need to > use your own versionning system: > {code:java} > //Here is where we finally iterate through the data in this partition of the > //RDD that has been sorted and partitioned > it.foreach{ > case (keyFamilyQualifier, cellValue:Array[Byte]) => val wl = > writeValueToHFile( > keyFamilyQualifier.rowKey, > keyFamilyQualifier.family, > keyFamilyQualifier.qualifier, > cellValue, > nowTimeStamp, > fs, > conn, > localTableName, > conf, > familyHFileWriteOptionsMapInternal, > hfileCompression, > writerMap, > stagingDir) > ...{code} > > Thus, I propose a third _bulkLoad_ method, based on the original method. > Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis > for the writes, this new method would use an _Iterator(KeyFamilyQualifier, > Array[Byte], Long_), with the _Long_ being the version. > In case of illogical version (for instance, a negative version), the method > would throw back to the current timestamp. > See the attached file for a proposal of this new _bulkLoad_ method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)