[ 
https://issues.apache.org/jira/browse/FLUME-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329047#comment-14329047
 ] 

Ashish Paliwal commented on FLUME-2437:
---------------------------------------

You got my question correctly.

I pushed some commits today. Its not working, but would give you a direction. 
Its based on Spooled Dir Source, using JetS3t. I am stuck at 
ResettableS3ObjectInputStream, trying to make it seekable, but it won't work in 
its current form. If the connection breaks during reading, we have to read all 
the bytes to come to the position. Have a look at it, I won't be touching it 
till tomm. If you update anything, please send a PR. In Zk, we need to store 
the file name plus marker (place till which events were processed)

Well if we don't rename files, we can't apply filter. Do you see customer's 
having thousand's of files in single bucket? Getting a big list during each 
list API call is a concern. Anyways these are optimisations, lets get it 
working 1st :)

In a nut shell, we need to get following working
1. ResettableS3ObjectInputStream needs fixing. Wanted to keep the channel based 
implementation intact, but that's not possible till we move to JDK 8 where we 
have SeekableByteChannel. Here we just need to fix the refillBuffer() API
2. Get to the main flow
3. Support multiple buckets, perhaps add abstraction like BucketInfo

If we fix #1, we can get into main flow of seeing the Source in Action and then 
can work on refining it to production grade.

> S3 Source
> ---------
>
>                 Key: FLUME-2437
>                 URL: https://issues.apache.org/jira/browse/FLUME-2437
>             Project: Flume
>          Issue Type: New Feature
>            Reporter: Jonathan Natkins
>            Assignee: Ashish Paliwal
>
> There have been multiple requests on the mailing list for an S3 source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to