nsivabalan edited a comment on pull request #3967: URL: https://github.com/apache/hudi/pull/3967#issuecomment-967486437
@vinothchandar : wrt structured streaming and timeline server being closed ahead, this is what I see. After the end of first micro batch, the write client is closed and hence triggers closure of timeline service. but subsequent micro batches do succeed though. I added some logs for testStructuredStreaming (using direct markers) ``` 1833 [main] WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4040. Attempting port 4041. streaming starting 10085 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.HoodieStreamingSink - writing to HoodieSparkSqlWriter 10381 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.AbstractHoodieClient - Constructor 10381 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.AbstractHoodieClient - Starting ETL server 10381 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.AbstractHoodieClient - Creating embedded timeline server 10387 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.embedded.EmbeddedTimelineService - Starting timeline server :: 10625 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.AbstractHoodieWriteClient - Constructor 13093 [pool-18-thread-2] WARN org.apache.hudi.table.marker.DirectWriteMarkers - Creating marker file /var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152238/2016/03/15/f743c5c8-6b42-4d4d-93df-2a34d7a3e9d8-0_0-26-248_20211112152238.parquet.marker.CREATE 13802 [pool-20-thread-2] WARN org.apache.hudi.table.marker.DirectWriteMarkers - Creating marker file /var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152238/2015/03/16/ff7dc47f-3c79-4472-98b0-09b716a5698a-0_1-32-249_20211112152238.parquet.marker.CREATE 13802 [pool-19-thread-2] WARN org.apache.hudi.table.marker.DirectWriteMarkers - Creating marker file /var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152238/2015/03/17/e9193d74-3e04-4096-8e05-a3b872df0a0e-0_2-32-250_20211112152238.parquet.marker.CREATE 14231 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.table.marker.DirectWriteMarkers - Returning marker files 14430 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.AbstractHoodieWriteClient - Committing 20211112152238 commit 14431 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.AbstractHoodieWriteClient - Close() 14431 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.embedded.EmbeddedTimelineService - Trying to close timeline server 14431 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.embedded.EmbeddedTimelineService - Closing Timeline server 14446 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.HoodieStreamingSink - Micro batch id=0 succeeded for commit=20211112152238 14446 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.HoodieStreamingSink - Micro batch id=0 succeeded 14494 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor - Current batch is falling behind. The trigger interval is 100 milliseconds, but spent 4614 milliseconds 15031 [ForkJoinPool-1-worker-11] WARN org.apache.spark.util.Utils - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf. 15571 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.HoodieStreamingSink - writing to HoodieSparkSqlWriter 17992 [Executor task launch worker for task 315] WARN org.apache.hudi.table.marker.DirectWriteMarkers - Creating marker file /var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152243/2016/03/15/f743c5c8-6b42-4d4d-93df-2a34d7a3e9d8-0_0-67-315_20211112152243.parquet.marker.MERGE 18353 [Executor task launch worker for task 316] WARN org.apache.hudi.table.marker.DirectWriteMarkers - Creating marker file /var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152243/2015/03/16/ff7dc47f-3c79-4472-98b0-09b716a5698a-0_1-73-316_20211112152243.parquet.marker.MERGE 18353 [Executor task launch worker for task 317] WARN org.apache.hudi.table.marker.DirectWriteMarkers - Creating marker file /var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152243/2015/03/17/e9193d74-3e04-4096-8e05-a3b872df0a0e-0_2-73-317_20211112152243.parquet.marker.MERGE 18790 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.table.marker.DirectWriteMarkers - Returning marker files 18918 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.AbstractHoodieWriteClient - Committing 20211112152243 commit 18918 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.client.AbstractHoodieWriteClient - Close() 18918 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.HoodieStreamingSink - Micro batch id=1 succeeded for commit=20211112152243 18918 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.hudi.HoodieStreamingSink - Micro batch id=1 succeeded 18952 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor - Current batch is falling behind. The trigger interval is 100 milliseconds, but spent 3472 milliseconds streaming ends 20593 [ForkJoinPool-1-worker-11] WARN org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was not found at path file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/metadata 22315 [ForkJoinPool-1-worker-11] WARN org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was not found at path file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/metadata 23465 [main] WARN org.apache.hudi.testutils.HoodieClientTestHarness - Closing file-system instance used in previous test-run 23482 [stream execution thread for [id = 41aab979-66f5-4a58-82bc-751838639d48, runId = e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN org.apache.spark.sql.execution.datasources.InMemoryFileIndex - The directory file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/source was not found. Was it deleted very recently? ``` Test was using direct marker types for the purpose of collecting logs. if I switch to timeline server based(intent of this patch), test will fail since the 2nd batch marker creations fail. Do you think we need to revisit the closure of write client in Structured streaming code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
