Hi Ivan,

The SparkJDBCOperator is an effort to replace Sqoop. For example, if you
run Spark on Kubernetes, you can also use Spark to do your Sqoop workloads.
Please keep in mind that this operator is not as rich in functionality as
Sqoop. The original PR is given here:
https://github.com/apache/airflow/pull/3021

The PySpark code is already given here:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_jdbc_script.py
The Operator will pass all the arguments to this script, so you won't have
to do this yourself.

You need to pass all the arguments to the operator, the PythonDoc is
self-explanatory:
https://github.com/apache/airflow/blob/master/airflow/contrib/operators/spark_jdbc_operator.py

For further reference, this operator is also called the Sqark (Sql +
Spark)Operator.

Hopefully, you're less lost now. If you have any further questions, let me
know.

Cheers, Fokko





Op vr 1 feb. 2019 om 13:54 schreef Iván Robla Albarrán <[email protected]
>:

> Hi ,
>
> I am seaching how to substitute Apache Sqoop
>
> I am analyzing SparkJDBCOperator, but i dont understand how i have to use .
>
> It a version of  SparkSubmit operator, for include as conection JDBC
> conection ?
>
>  I need to include Spark code?
>
> Any example?
>
> Thanks, I am very lost
>
> Regards,
> Iván Robla
>

Reply via email to