sivabalan narayanan created HUDI-2493:
-----------------------------------------
Summary: Verify removing glob pattern works w/ all key generators
Key: HUDI-2493
URL: https://issues.apache.org/jira/browse/HUDI-2493
Project: Apache Hudi
Issue Type: Improvement
Components: Spark Integration
Reporter: sivabalan narayanan
In the last release we added support to remove glob pattern. i.e.
while reading hudi dataset,
spark.read.format("hudi").load(basePath+"/*/*") ->
spark.read.format("hudi").load(basePath)
Suffixing with "/*/*" is not required anymore.
But we need to verify if the same works for all key generators before we can
announce that in general this can be used. Or else we have to call out for what
key gens this works. and put in a fix for those which does not work.
For eg:
I tried removing glob pattern from few key generator tests in TestCOWDataSource
and it failed.
[https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala#L413]
ones which worked after removing glob pattern
SimpleKeyGenerator, ComplexKeyGenerator, GlobalDeleteKeyGenerator,
NonpartitionedKeyGenerator
One which did not work
CustomKeyGenerator, TimestampBasedKeyGenerator
You can try it locally by removing the glob pattern for these tests.
stacktrace for timestamp based key gen
{code:java}
0 [main] WARN org.apache.spark.util.Utils - Your hostname,
Sivabalans-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using
10.0.0.202 instead (on interface en0)0 [main] WARN
org.apache.spark.util.Utils - Your hostname, Sivabalans-MacBook-Pro.local
resolves to a loopback address: 127.0.0.1; using 10.0.0.202 instead (on
interface en0)1 [main] WARN org.apache.spark.util.Utils - Set
SPARK_LOCAL_IP if you need to bind to another address390 [main] WARN
org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable11317 [main]
WARN org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was
not found at path
file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit1114662825639168138/dataset/.hoodie/metadata11515
[main] WARN org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata
table was not found at path
file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit1114662825639168138/dataset/.hoodie/metadata11840
[main] WARN org.apache.spark.util.Utils - Truncated the string
representation of a plan since it was too large. This behavior can be adjusted
by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.12319 [main] WARN
org.apache.hudi.testutils.HoodieClientTestHarness - Closing file-system
instance used in previous test-run
org.opentest4j.AssertionFailedError: Expected :trueActual :false<Click to see
difference> at
org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at
org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40) at
org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:35) at
org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:162) at
org.apache.hudi.functional.TestCOWDataSource.testSparkPartitonByWithTimestampBasedKeyGenerator(TestCOWDataSource.scala:517)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
at
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
at
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
at
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
at
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
at
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
at
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
at
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
at
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
at
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208)
at
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137)
at
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
at
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
at
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
at java.util.ArrayList.forEach(ArrayList.java:1257) at
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
at
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
at
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
at java.util.ArrayList.forEach(ArrayList.java:1257) at
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
at
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
at
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
at
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
at
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
at
org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
at
org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
at
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
at
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:87)
at
org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:53)
at
org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:66)
at
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:51)
at
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:87)
at
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:66)
at
com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:71)
at
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at
com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:221)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)