mazhiyong created GOBBLIN-242:
---------------------------------

             Summary: distcp error java.lang.IllegalArgumentException: Wrong 
FS: hdfs://HDFS_A/data/gobblin-current.log, expected: hdfs://HDFS_B
                 Key: GOBBLIN-242
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-242
             Project: Apache Gobblin
          Issue Type: Bug
            Reporter: mazhiyong


I am use gobblin-distcp copy data from HDFS_A to HDFS_B.
My gobblin deploy in Hadoop_A(contain Yarn_A, HDFS_A) 
When i run the gobblin-distcp job copy data of HDFS_A to HDFS_B is successfully.
But, i run the gobblin-distcp job copy data of HDFS_B to HDFS_A always failed.

*the container log*
2017-09-07 10:12:56,022 INFO [main] gobblin.runtime.TaskExecutor: Executing 
task task_distcp-hdfs-to-yarnhdfs_1504750223269_0
2017-09-07 10:12:56,076 INFO [TaskExecutor-0] gobblin.runtime.TaskExecutor: 
Submitting fork 0 of task task_distcp-hdfs-to-yarnhdfs_1504750223269_0
2017-09-07 10:12:56,089 INFO [main] 
gobblin.runtime.GobblinMultiTaskAttempt-attempt_1503884889988_9291_m_000000_0: 
Waiting for submitted tasks of job job_distcp-hdfs-to-yarnhdfs_1504750223269 to 
complete in container attempt_1503884889988_9291_m_000000_0...
2017-09-07 10:12:56,089 INFO [main] 
gobblin.runtime.GobblinMultiTaskAttempt-attempt_1503884889988_9291_m_000000_0: 
1 out of 1 tasks of job job_distcp-hdfs-to-yarnhdfs_1504750223269 are running 
in container attempt_1503884889988_9291_m_000000_0
2017-09-07 10:12:56,111 INFO [ForkExecutor-0] gobblin.runtime.TaskContext: 
Found configured writer builder as 
gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
2017-09-07 10:12:56,111 INFO [TaskExecutor-0] gobblin.runtime.Task: Extracted 1 
data records
2017-09-07 10:12:56,111 INFO [TaskExecutor-0] gobblin.runtime.Task: Row quality 
checker finished with results: 
2017-09-07 10:12:56,149 INFO [ForkExecutor-0] gobblin.runtime.fork.Fork-0: 
Wrapping writer gobblin.writer.PartitionedDataWriter@2774ab51
2017-09-07 10:12:56,225 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: 
Caught exception. This may be retried.
{color:red}java.lang.IllegalArgumentException: Wrong FS: 
hdfs://HDFS_B/data/test/gobblin-current.log, expected: hdfs://HDFS_A{color}
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:648)
        at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:468)
        at 
gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:218)
        at 
gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:166)
        at 
gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:82)
        at 
gobblin.instrumented.writer.InstrumentedDataWriterBase.write(InstrumentedDataWriterBase.java:165)
        at 
gobblin.instrumented.writer.InstrumentedDataWriter.write(InstrumentedDataWriter.java:38)
        at 
gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeImpl(InstrumentedDataWriterDecorator.java:76)
        at 
gobblin.instrumented.writer.InstrumentedDataWriterDecorator.write(InstrumentedDataWriterDecorator.java:68)
        at 
gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:127)
        at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
        at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
        at 
com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
        at com.github.rholder.retry.Retryer.call(Retryer.java:160)
        at 
com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
        at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
        at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
        at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
        at 
gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
        at 
gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
        at gobblin.runtime.fork.Fork.run(Fork.java:180)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2017-09-07 10:12:57,227 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: 
Caught exception. This may be retried.
java.io.IOException: 
gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter can only 
process one file.
        at 
gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:162)
        at 
gobblin.data.management.copy.writer.FileAwareInputStreamDataWriter.writeImpl(FileAwareInputStreamDataWriter.java:82)
        at 
gobblin.instrumented.writer.InstrumentedDataWriterBase.write(InstrumentedDataWriterBase.java:165)
        at 
gobblin.instrumented.writer.InstrumentedDataWriter.write(InstrumentedDataWriter.java:38)
        at 
gobblin.instrumented.writer.InstrumentedDataWriterDecorator.writeImpl(InstrumentedDataWriterDecorator.java:76)
        at 
gobblin.instrumented.writer.InstrumentedDataWriterDecorator.write(InstrumentedDataWriterDecorator.java:68)
        at 
gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:127)
        at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
        at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
        at 
com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
        at com.github.rholder.retry.Retryer.call(Retryer.java:160)
        at 
com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
        at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
        at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
        at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
        at 
gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
        at 
gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
        at gobblin.runtime.fork.Fork.run(Fork.java:180)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2017-09-07 10:12:59,228 WARN [ForkExecutor-0] gobblin.writer.RetryWriter: 
Caught exception. This may be retried.

myjob config
job.name=distcp-hdfs-to-yarnhdfs
job.group=distcp-hdfs-to-yarnhdfs
job.description=distcp
job.class=gobblin.azkaban.AzkabanJobLauncher

source.class=gobblin.data.management.copy.CopySource
source.filebased.fs.uri=hdfs://HDFA_B
gobblin.dataset.pattern=/data/test/*.log
#gobblin.dataset.pattern=/data/huiting_3000h_test_set/*.tar.gz
#gobblin.dataset.pattern=/gobblin/distcp/data/*.tar.gz

extract.namespace=gobblin.copy

converter.classes=gobblin.converter.IdentityConverter

writer.destination.type=HDFS
writer.fs.uri=hdfs://HDFS_A
#writer.output.format=txt
writer.builder.class=gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder
writer.file.path.type=tablename

data.publisher.type=gobblin.data.management.copy.publisher.CopyDataPublisher
data.publisher.final.dir=/gobblin/data

data.publisher.final.name=mz

distcp.persist.dir=/gobblin/distcp/data

task.maxretries=0

workunit.retry.enabled=false

# Intermediate steps configuration.
work.dir=/gobblin/distcp
state.store.dir=${work.dir}/state-store
writer.staging.dir=${work.dir}/taskStaging
writer.output.dir=${work.dir}/taskOutput

mr.job.root.dir=${work.dir}/working

job.lock.enabled=true
job.lock.dir=${work.dir}/locks




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to