nsivabalan commented on pull request #3967:
URL: https://github.com/apache/hudi/pull/3967#issuecomment-967486437


   @vinothchandar : wrt structured streaming and timeline server being closed 
ahead, this is what I see. After the end of first micro batch, the write client 
is closed and hence triggers closure of timeline service. but subsequent micro 
batches do succeed though. 
   
   I added some logs for testStructuredStreaming. 
   ```
   1833 [main] WARN  org.apache.spark.util.Utils  - Service 'SparkUI' could not 
bind on port 4040. Attempting port 4041.
   streaming starting
   10085 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.HoodieStreamingSink  - writing to HoodieSparkSqlWriter
   10381 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.AbstractHoodieClient  - Constructor 
   10381 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.AbstractHoodieClient  - Starting ETL server
   10381 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.AbstractHoodieClient  - Creating embedded timeline 
server 
   10387 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.embedded.EmbeddedTimelineService  - Starting timeline 
server :: 
   10625 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.AbstractHoodieWriteClient  - Constructor 
   13093 [pool-18-thread-2] WARN  
org.apache.hudi.table.marker.DirectWriteMarkers  - Creating marker file 
/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152238/2016/03/15/f743c5c8-6b42-4d4d-93df-2a34d7a3e9d8-0_0-26-248_20211112152238.parquet.marker.CREATE
   13802 [pool-20-thread-2] WARN  
org.apache.hudi.table.marker.DirectWriteMarkers  - Creating marker file 
/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152238/2015/03/16/ff7dc47f-3c79-4472-98b0-09b716a5698a-0_1-32-249_20211112152238.parquet.marker.CREATE
   13802 [pool-19-thread-2] WARN  
org.apache.hudi.table.marker.DirectWriteMarkers  - Creating marker file 
/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152238/2015/03/17/e9193d74-3e04-4096-8e05-a3b872df0a0e-0_2-32-250_20211112152238.parquet.marker.CREATE
   14231 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.table.marker.DirectWriteMarkers  - Returning marker files 
   14430 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.AbstractHoodieWriteClient  - Committing 20211112152238 
commit
   14431 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.AbstractHoodieWriteClient  - Close() 
   14431 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.embedded.EmbeddedTimelineService  - Trying to close 
timeline server
   14431 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.embedded.EmbeddedTimelineService  - Closing Timeline 
server
   14446 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.HoodieStreamingSink  - Micro batch id=0 succeeded for 
commit=20211112152238
   14446 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.HoodieStreamingSink  - Micro batch id=0 succeeded
   14494 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor  - Current 
batch is falling behind. The trigger interval is 100 milliseconds, but spent 
4614 milliseconds
   15031 [ForkJoinPool-1-worker-11] WARN  org.apache.spark.util.Utils  - 
Truncated the string representation of a plan since it was too large. This 
behavior can be adjusted by setting 'spark.debug.maxToStringFields' in 
SparkEnv.conf.
   15571 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.HoodieStreamingSink  - writing to HoodieSparkSqlWriter
   17992 [Executor task launch worker for task 315] WARN  
org.apache.hudi.table.marker.DirectWriteMarkers  - Creating marker file 
/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152243/2016/03/15/f743c5c8-6b42-4d4d-93df-2a34d7a3e9d8-0_0-67-315_20211112152243.parquet.marker.MERGE
   18353 [Executor task launch worker for task 316] WARN  
org.apache.hudi.table.marker.DirectWriteMarkers  - Creating marker file 
/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152243/2015/03/16/ff7dc47f-3c79-4472-98b0-09b716a5698a-0_1-73-316_20211112152243.parquet.marker.MERGE
   18353 [Executor task launch worker for task 317] WARN  
org.apache.hudi.table.marker.DirectWriteMarkers  - Creating marker file 
/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/.temp/20211112152243/2015/03/17/e9193d74-3e04-4096-8e05-a3b872df0a0e-0_2-73-317_20211112152243.parquet.marker.MERGE
   18790 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.table.marker.DirectWriteMarkers  - Returning marker files 
   18918 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.AbstractHoodieWriteClient  - Committing 20211112152243 
commit
   18918 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.client.AbstractHoodieWriteClient  - Close() 
   18918 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.HoodieStreamingSink  - Micro batch id=1 succeeded for 
commit=20211112152243
   18918 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.hudi.HoodieStreamingSink  - Micro batch id=1 succeeded
   18952 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor  - Current 
batch is falling behind. The trigger interval is 100 milliseconds, but spent 
3472 milliseconds
   streaming ends
   20593 [ForkJoinPool-1-worker-11] WARN  
org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not 
found at path 
file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/metadata
   22315 [ForkJoinPool-1-worker-11] WARN  
org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not 
found at path 
file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/dest/.hoodie/metadata
   23465 [main] WARN  org.apache.hudi.testutils.HoodieClientTestHarness  - 
Closing file-system instance used in previous test-run
   23482 [stream execution thread for [id = 
41aab979-66f5-4a58-82bc-751838639d48, runId = 
e3a570e9-9cb2-4216-af1d-791f563be9ba]] WARN  
org.apache.spark.sql.execution.datasources.InMemoryFileIndex  - The directory 
file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y140000gn/T/junit3225050705126728730/dataset/source
 was not found. Was it deleted very recently?
   
   ```
   
   Test was using direct marker types for the purpose of collecting logs. if I 
switch to timeline server based(intent of this patch), test will fail since the 
2nd batch marker creations fail. 
   Do you think we need to revisit the closure of write client in Structured 
streaming code? 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to