taklwu commented on a change in pull request #88:
URL: https://github.com/apache/hbase-connectors/pull/88#discussion_r783384657
##########
File path: pom.xml
##########
@@ -129,20 +129,20 @@
<compileSource>1.8</compileSource>
<java.min.version>${compileSource}</java.min.version>
<maven.min.version>3.5.0</maven.min.version>
- <hbase.version>2.2.2</hbase.version>
+ <hbase.version>2.4.8</hbase.version>
<exec.maven.version>1.6.0</exec.maven.version>
<audience-annotations.version>0.5.0</audience-annotations.version>
<junit.version>4.12</junit.version>
- <hbase-thirdparty.version>2.2.1</hbase-thirdparty.version>
+ <hbase-thirdparty.version>3.5.1</hbase-thirdparty.version>
Review comment:
do you see any problem of using 4.0.1? but in #89 , it does not seems to
be a problem.
##########
File path: pom.xml
##########
@@ -129,20 +129,20 @@
<compileSource>1.8</compileSource>
<java.min.version>${compileSource}</java.min.version>
<maven.min.version>3.5.0</maven.min.version>
- <hbase.version>2.2.2</hbase.version>
+ <hbase.version>2.4.8</hbase.version>
<exec.maven.version>1.6.0</exec.maven.version>
<audience-annotations.version>0.5.0</audience-annotations.version>
<junit.version>4.12</junit.version>
- <hbase-thirdparty.version>2.2.1</hbase-thirdparty.version>
+ <hbase-thirdparty.version>3.5.1</hbase-thirdparty.version>
<hadoop-two.version>2.8.5</hadoop-two.version>
- <hadoop-three.version>3.0.3</hadoop-three.version>
- <hadoop.version>${hadoop-two.version}</hadoop.version>
+ <hadoop-three.version>3.2.0</hadoop-three.version>
+ <hadoop.version>${hadoop-three.version}</hadoop.version>
<slf4j.version>1.7.25</slf4j.version>
<log4j.version>1.2.17</log4j.version>
- <checkstyle.version>8.18</checkstyle.version>
- <maven.checkstyle.version>3.1.0</maven.checkstyle.version>
- <surefire.version>3.0.0-M4</surefire.version>
- <enforcer.version>3.0.0-M3</enforcer.version>
+ <checkstyle.version>8.45.1</checkstyle.version>
Review comment:
can you tell me why you chose this version instead of 8.28 that was used
by the hbase master ?
##########
File path: spark/pom.xml
##########
@@ -44,13 +44,13 @@
<properties>
<protobuf.plugin.version>0.6.1</protobuf.plugin.version>
- <hbase-thirdparty.version>2.1.0</hbase-thirdparty.version>
- <jackson.version>2.9.10</jackson.version>
- <spark.version>2.4.0</spark.version>
+ <hbase-thirdparty.version>3.5.1</hbase-thirdparty.version>
Review comment:
remove this line
```suggestion
```
##########
File path:
spark/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala
##########
@@ -902,8 +902,8 @@ class HBaseContext(@transient val sc: SparkContext,
tempConf.setFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY, 0.0f)
val contextBuilder = new HFileContextBuilder()
.withCompression(Algorithm.valueOf(familyOptions.compression))
- .withChecksumType(HStore.getChecksumType(conf))
- .withBytesPerCheckSum(HStore.getBytesPerChecksum(conf))
+ .withChecksumType(StoreUtils.getChecksumType(conf))
+ .withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf))
Review comment:
```suggestion
// HBASE-25249 introduced an incompatible change in the IA.Private
HStore and StoreUtils
// so here, we directly use conf.get for CheckSumType and
BytesPerCheckSum to make it
// compatible between hbase 2.3.x and 2.4.x
val contextBuilder = new HFileContextBuilder()
.withCompression(Algorithm.valueOf(familyOptions.compression))
// ChecksumType.nameToType is still an IA.Private Utils, but it's
unlikely to be changed.
.withChecksumType(ChecksumType
.nameToType(conf.get(HConstants.CHECKSUM_TYPE_NAME,
ChecksumType.getDefaultChecksumType.getName)))
.withCellComparator(CellComparator.getInstance())
.withBytesPerCheckSum(conf.getInt(HConstants.BYTES_PER_CHECKSUM,
HFile.DEFAULT_BYTES_PER_CHECKSUM))
.withBlockSize(familyOptions.blockSize)
```
basically I want to make it compatible with hbase-2.3.7 as well
##########
File path: spark/README.md
##########
@@ -18,19 +18,25 @@ limitations under the License.
# Apache HBase™ Spark Connector
-## Scala and Spark Versions
+## Spark, Scala and Configurable Options
-To generate an artifact for a different [spark
version](https://mvnrepository.com/artifact/org.apache.spark/spark-core) and/or
[scala version](https://www.scala-lang.org/download/all.html), pass
command-line options as follows (changing version numbers appropriately):
+To generate an artifact for a different [Spark
version](https://mvnrepository.com/artifact/org.apache.spark/spark-core) and/or
[Scala version](https://www.scala-lang.org/download/all.html),
+[Hadoop
version](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core), or
[HBase version](https://mvnrepository.com/artifact/org.apache.hbase/hbase),
pass command-line options as follows (changing version numbers appropriately):
```
-$ mvn -Dspark.version=2.2.2 -Dscala.version=2.11.7 -Dscala.binary.version=2.11
clean install
+$ mvn -Dspark.version=3.1.2 -Dscala.version=2.12.10
-Dhadoop-three.version=3.2.0 -Dscala.binary.version=2.12 -Dhbase.version=2.4.8
clean install
```
----
-To build the connector with Spark 3.0, compile it with scala 2.12.
-Additional configurations that you can customize are the Spark version, HBase
version, and Hadoop version.
-Example:
+Note: to build the connector with Spark 2.x, compile it with
`-Dscala.binary.version=2.11` and use the profile `-Dhadoop.profile=2.0`
+
+## Configuration and Installation
+**Client-side** (Spark) configuration:
+- The HBase configuration file `hbase-site.xml` should be made available to
Spark, it can be copied to `$SPARK_CONF_DIR` (default is $SPARK_HOME/conf`)
+
+**Server-side** (HBase region servers) configuration:
+- The following jars needs to be in the CLASSPATH of the HBase region servers:
Review comment:
```suggestion
- The following jars need to be in the CLASSPATH of the HBase region servers:
```
##########
File path: pom.xml
##########
@@ -129,20 +129,20 @@
<compileSource>1.8</compileSource>
<java.min.version>${compileSource}</java.min.version>
<maven.min.version>3.5.0</maven.min.version>
- <hbase.version>2.2.2</hbase.version>
+ <hbase.version>2.4.8</hbase.version>
Review comment:
use 2.4.9
##########
File path: spark/hbase-spark/pom.xml
##########
@@ -345,20 +347,16 @@
</dependency>
</dependencies>
</profile>
- <!--
- profile for building against Hadoop 3.0.x. Activate using:
- mvn -Dhadoop.profile=3.0
- -->
+ <!-- profile against Hadoop 3.x: This is the default. -->
<profile>
<id>hadoop-3.0</id>
<activation>
<property>
- <name>hadoop.profile</name>
- <value>3.0</value>
+ <name>!hadoop.profile</name>
</property>
</activation>
<properties>
- <hadoop.version>3.0</hadoop.version>
+ <hadoop.version>3.2.0</hadoop.version>
Review comment:
```suggestion
<hadoop.version>${hadoop-three.version}</hadoop.version>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]