Re: [DISCUSSION] Propose to remove Sqoop

Yuqi Gu Mon, 08 Aug 2022 20:01:10 -0700

Thank you guys for the comments.

Sqoop is the tool of the data migration to transfer data between Hadoop
system (HDFS, Hive,Hbase...) and RDBMS (Mysql, Oracle, Postgre...).
Sqoop is dependent on MapReduce. Some guys developed the features to
supoort sqoop running on Spark, but this feature was not merged into Sqoop
trunk and  it was not officialy supported by Sqoop.


For Spark, Spark could directly take the JDBC to read Data from RDBMS and
conver them into dataframes instead of using sqoop to transfer the data.
Spark codes (Scala, Java, JDBC configuration, SQL) may be hard to
understand and maintain for the new comers.
But anyway Spark is the credible alternative for Sqoop and is the flexible
and simple tool to greatly increase the efficiency of data migration,
extraction, processing.
IMO, it makes sense to remove Sqoop from Bigtop.

Thanks,

BRs,
Yuqi

Re: [DISCUSSION] Propose to remove Sqoop

Reply via email to