[
https://issues.apache.org/jira/browse/SQOOP-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631487#comment-13631487
]
Jarek Jarcec Cecho commented on SQOOP-390:
------------------------------------------
Hi [~metaruslan],
thank you very much for your feedback. The JIRA on Cloudera site is no longer
used, so if you're still interested in the functionality, you should go ahead
and create the ticket here on Apache JIRA.
I think that the information in docs should suggest that {{--direct}} option in
case of PostgreSQL works only with import and not with export. In case that
user is interested, then there is separate Connector that can utilize
{{pg_bulkload}} utility. However as you've mentioned, {{pg_bulkload}} is not
part of PostgreSQL and thus it's not activated by {{--direct}} parameter.
Jarcec
> PostgreSQL connector for direct export with pg_bulkload
> -------------------------------------------------------
>
> Key: SQOOP-390
> URL: https://issues.apache.org/jira/browse/SQOOP-390
> Project: Sqoop
> Issue Type: New Feature
> Reporter: Masatake Iwasaki
> Assignee: Masatake Iwasaki
> Fix For: 1.4.3
>
> Attachments: pgbulkload-connector-r6.patch, SQOOP-390-1.patch
>
>
> h1. Features
> * Fast data export into PostgreSQL database with
> [pg_bulkload|http://pgbulkload.projects.postgresql.org/index.html].
> * User can get benefit of functionality of pg_bulkload such as
> ** fast export bypassing shared bufferes and WAL,
> ** removing invalid data which cause parse error,
> ** ETL feature with filter functions.
> h1. Implementation
> h2. PGBulkloadExportMapper
> At first, each map tasks create their own staging table with names based on
> task attempt id because pg_bulkload holds table level lock.
> Arguments of pg_bulkload command can be passed via configuration object.
> The Mapper export data by invoking pg_bulkload command as external process.
> Commnad execution is done in the same way as DirectPostgresqlManager.
> h2. PGBulkloadExportReducer
> Reducer migrates data from staging tables into destination table.
> Reducer gets the names of staging tables as map output values.
> In order to do migration in a transaction, the number of reducers must be 1.
> (It is set by ConnectionManager internally).
> Migration is done in same way as Sqoop defalt connectors using "INSERT INTO
> dst ( SELECT * FROM src )".
> In the cleanup method, staging tables are dropped.
> If exception is raised in the reducer, garbage staging tables are left.
> User can delete them by executing this connector with --clear-staging-table
> option.
> h1. Requirements
> * pg_bulkload must be installed on DB server and all slave nodes.
> ** RPM for RedHat or CentOS is available in [download
> page|http://pgfoundry.org/frs/?group_id=1000261].
> * [PostgreSQL JDBC| http://jdbc.postgresql.org/download.html] is also
> required on client node (same as PostgresqlManager).
> * Superuser role of PostgreSQL database is required for execution of
> pg_bulkload.
> h1. Usage
> Currently there is no Factory class.
> Specify connection manager class name with --connection-manager option to use.
>
> {noformat}
> $ sqoop export --connect jdbc:postgresql://localhost:5432/test \
> --connection-manager
> com.cloudera.sqoop.manager.PGBulkloadManager \
> --table test --username postgres --export-dir=/test -m 1
> {noformat}
> You can also specify pg_bulkload configuration with Hadoop configuration
> properties.
> {noformat}
> $ sqoop export \
> -Dpgbulkload.bin="/usr/local/bin/pg_bulkload" \
> -Dpgbulkload.input.field.delim=$'\t' \
> -Dpgbulkload.check.constraints="YES" \
> -Dpgbulkload.parse.errors="INFINITE" \
> -Dpgbulkload.duplicate.errors="INFINITE" \
> --connect jdbc:postgresql://localhost:5432/test \
> --connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
> --table test --username postgres --export-dir=/test -m 1
> {noformat}
> h1. Test
> There is test class named PGBulkloadManagerManualTest extending TestExport.
> This test expects that
> * The major version of PostgreSQL is 9.0,
> * The version of pg_bulkload is 3.1.1,
> * PostgreSQL server is running on localhost:5432 of testing node,
> * there is database named sqooptest,
> * super user role named sqooptest exists,
> * no password is set for the role, or .pgpass file is created.
> Tests can be invoked as below.
> {noformat}
> # ant -Dtestcase=PGBulkloadManagerManualTest test
> {noformat}
> h2. Test on CentOS 6
> Install JDK and Ant:
> {noformat}
> # rpm -ivh jdk-6u29-linux-amd64.rpm
> # yum install ant-junit
> {noformat}
> Setup for PostgreSQL:
> {noformat}
> # wget http://ftp.postgresql.org/pub/source/v9.0.7/postgresql-9.0.7.tar.bz2
> # tar jxf postgresql-9.0.7.tar.bz2
> # cd postgresql-9.0.7
> # ./configure --prefix=/usr/local
> # make
> # make install
> # useradd postgres
> # mkdir /var/pgdata
> # chown postgres:postgres /var/pgdata
> # chmod 700 /var/pgdata
> # sudo -u postgres /usr/local/bin/initdb -D /var/pgdata -E utf-8
> # sudo -u postgres /usr/local/bin/pg_ctl start -D /var/pgdata -l
> /var/pgdata/postgresql.log
> # createuser -U postgres -s sqooptest
> # createdb -U sqooptest sqooptest
> {noformat}
> Setup for pg_bulkload and PostgreSQL JDBC Driver:
> {noformat}
> # cd ..
> # wget http://pgfoundry.org/frs/download.php/3176/pg_bulkload-3.1.1.tar.gz
> # tar zxf pg_bulkload-3.1.1.tar.gz
> # mv pg_bulkload-3.1.1 postgresql-9.0.7/contrib/
> # cd postgresql-9.0.7/contrib/pg_bulkload-3.1.1
> # make
> # make install
> # psql -U sqooptest -f /usr/local/share/postgresql/contrib/pg_bulkload.sql
> sqooptest
> # ln -s /usr/local/bin/pg_bulkload /usr/bin/
> {noformat}
> Setup for PostgreSQL JDBC Driver:
> {noformat}
> # wget http://jdbc.postgresql.org/download/postgresql-9.0-802.jdbc4.jar
> # cp postgresql-9.0-802.jdbc4.jar /usr/local/src/sqoop-trunk/lib/
> {noformat}
> run test.
> {noformat}
> # cd /usr/local/src/sqoop-trunk
> # ant -Dtestcase=PGBulkloadManagerManualTest test
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira