Kip Kohn created GOBBLIN-1906:
---------------------------------

             Summary: protect against nulls when converting `State` to a 
`hadoop.conf.Configuration`
                 Key: GOBBLIN-1906
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1906
             Project: Apache Gobblin
          Issue Type: Bug
          Components: gobblin-core
            Reporter: Kip Kohn
            Assignee: Abhishek Tiwari


A customer reported seeing:
{code:java}
Error: java.io.IOException: Task failed: java.lang.IllegalArgumentException: 
The value of property <<redacted>> must not be null
  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
  at org.apache.hadoop.conf.Configuration.set(Configuration.java:1260)
  at org.apache.hadoop.conf.Configuration.set(Configuration.java:1241)
  at 
org.apache.gobblin.util.JobConfigurationUtils.putStateIntoConfiguration(JobConfigurationUtils.java:95)
  at org.apache.gobblin.writer.FsDataWriter.<init>(FsDataWriter.java:102)
  at org.apache.gobblin.writer.GobblinBaseOrcWriter.<init 
(GobblinBaseOrcWriter.java:65)
  at org.apache.gobblin.writer.GobblinOrcWriter.<init>(GobblinOrcWriter.java:42)
  at <<redacted>>
  at 
org.apache.gobblin.writer.PartitionedDataWriter$4.get(PartitionedDataWriter.java:230)
  at 
org.apache.gobblin.writer.PartitionedDataWriter$4.get(PartitionedDataWriter.java:225)
  at 
org.apache.gobblin.writer.CloseOnFlushWriterWrapper.<init>(CloseOnFlushWriterWrapper.java:73)
  at 
org.apache.gobblin.writer.PartitionedDataWriter.<init>(PartitionedDataWriter.java:224)
  at org.apache.gobblin.runtime.fork.Fork.buildWriter(Fork.java:571)
  at org.apache.gobblin.runtime.fork.Fork.buildWriterIfNotPresent(Fork.java:579)
  at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:525)
  at 
org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103)
  at 
org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86)
  at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:257)
  at 
org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
  at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
  at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748) (Gobblin task id <<redacted>>,  
container id attempt_1690893552521_3376012_m_000111_0)
  at 
org.apache.gobblin.runtime.GobblinMultiTaskAttempt.persistTaskStateStore(GobblinMultiTaskAttempt.java:367)
... {code}
the appears to arise from concurrent modification to the `State`'s underlying 
`Properties` (i.e. between the time the `keySet()` is first read and when each 
value is accessed from the same `Properties`).

although the customer's impl seems to warrant synchronization, given that a 
null-value is certain to be rejected by `o.a.hadoop.conf.Configuration`, 
defensively filter those out ahead of time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to