[ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378920#comment-17378920
 ] 

ASF GitHub Bot commented on HUDI-1896:
--------------------------------------

hudi-bot edited a comment on pull request #3256:
URL: https://github.com/apache/hudi/pull/3256#issuecomment-877963467


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "74b342890e833e84bf1f8e163465df46b325845a",
       "status" : "PENDING",
       "url" : 
"https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=853";,
       "triggerID" : "74b342890e833e84bf1f8e163465df46b325845a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 74b342890e833e84bf1f8e163465df46b325845a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=853)
 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -----------------------------------------------------------------
>
>                 Key: HUDI-1896
>                 URL: https://issues.apache.org/jira/browse/HUDI-1896
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: DeltaStreamer
>            Reporter: Raymond Xu
>            Priority: Critical
>              Labels: pull-request-available
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to