thanks for sharing. I will analyze your postgres solution
Thanks! Regards, Iván El dom., 10 feb. 2019 a las 12:45, Driesprong, Fokko (<[email protected]>) escribió: > Looking good Nicolas, thanks for sharing. > > Since there is also Pyspark support, it should be relative straightforward > to invoke the spark-postgres library from Airflow. > > Cheers, Fokko > > Op za 9 feb. 2019 om 12:16 schreef Nicolas Paris <[email protected] > >: > > > Hi > > > > Be careful with sparkJdbc as a replacement of Sqoop for large tables. > > Sqoop is able to handle any source table size while sparkJdbc design does > > not. > > While it provides a way to distribute in multiple partitions, spark is > > limited by the executors memory where sqoop is limited by the hdfs > > space. > > > > As a result, I have written a spark library (for postgres only right > > now) witch overcome the core spark jdbc limitations. It handles any > > workload, and my tests show it was 8 times faster than sqoop. I have not > > tested it with airflow, but it is compatible with apache livy and > > pySpark. > > > > https://github.com/EDS-APHP/spark-postgres > > > > > > On Fri, Feb 01, 2019 at 01:53:57PM +0100, Iván Robla Albarrán wrote: > > > Hi , > > > > > > I am seaching how to substitute Apache Sqoop > > > > > > I am analyzing SparkJDBCOperator, but i dont understand how i have to > > use . > > > > > > It a version of SparkSubmit operator, for include as conection JDBC > > > conection ? > > > > > > I need to include Spark code? > > > > > > Any example? > > > > > > Thanks, I am very lost > > > > > > Regards, > > > Iván Robla > > > > -- > > nicolas > > >
