[ 
https://issues.apache.org/jira/browse/SPARK-18187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627134#comment-15627134
 ] 

Michael Armbrust commented on SPARK-18187:
------------------------------------------

I think the configuration should only be used when deciding if we should 
perform a new compaction.  The identification of a compaction vs a delta should 
be done based on the file itself.  Today this could be done by looking for the 
{{compact}} suffix.  However, I think this mechanism also has issues, as two 
streams writing to the same log but with different configurations would fail to 
conflict.

That said, I think fixing the latter issue is going to require us rev-ing the 
log version.  Since thats not free, we would probably want to see if there are 
other changes we should lump into the new version.  Given that, I'd be okay 
keeping the existing format, looking at file names instead of modular 
arithmetic, and revisiting moving the compaction identifier into the log itself 
(rather than the filename) in a follow up.

> CompactibleFileStreamLog should not rely on "compactInterval" to detect a 
> compaction batch
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18187
>                 URL: https://issues.apache.org/jira/browse/SPARK-18187
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.0.1
>            Reporter: Shixiong Zhu
>
> Right now CompactibleFileStreamLog uses compactInterval to check if a batch 
> is a compaction batch. However, since this conf is controlled by the user, 
> they may just change it, and CompactibleFileStreamLog will just silently 
> return the wrong answer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to