[jira] [Commented] (CARBONDATA-3130) CarbonData Support Flink

Nicholas Jiang (JIRA) Thu, 06 Dec 2018 23:46:51 -0800


    [ 
https://issues.apache.org/jira/browse/CARBONDATA-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712447#comment-16712447
 ]


Nicholas Jiang commented on CARBONDATA-3130:
--------------------------------------------

# Each flink process, use the SDK to create a directory of its own online 
segment (use UUID for the directory name)
# Then continuously write new data files to this directory. There is also an 
index file in this directory to record which data files are valid.
# When the online segment directory reaches a certain size, the handoff action 
is triggered, that is, the table status metadata is modified. The SDK is then 
responsible for creating a new online segment directory. Then repeat step 2.
# For the online segment query, first read the index file to get a list of 
valid data files, and then read each file.
* The role of the index file is to avoid reading half of the flush data file 
when querying.In the index file, you need to record a list of valid data files, 
or you can add some minmax statistics.When reading, first read the index file, 
get the data file path, list of path, and then read these files.The name of the 
valid data file in the current online segment directory is written in the index 
file.
* Each process has an online segment directory to ensure that each process can 
write concurrently. This mechanism can be used in scenarios without a central 
collaborator, such as flink, kafka stream, cassandra, etc.
* Reading while writing, refers to one side using flink into the library, while 
using spark/presto query, in this case can not let the query side read to not 
write a complete data file.
* The essential difference between online segment and stream segment is that 
the former is the process level (no scheduling, multiple active), the latter is 
application level (with scheduling, only one active).


> CarbonData Support Flink
> ------------------------
>
>                 Key: CARBONDATA-3130
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3130
>             Project: CarbonData
>          Issue Type: New Feature
>          Components: flink-integration
>            Reporter: Nicholas Jiang
>            Assignee: Nicholas Jiang
>            Priority: Minor
>
> For streaming warehousing scenarios，CarbonData support flink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CARBONDATA-3130) CarbonData Support Flink

Reply via email to