[
https://issues.apache.org/jira/browse/FLUME-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329047#comment-14329047
]
Ashish Paliwal commented on FLUME-2437:
---------------------------------------
You got my question correctly.
I pushed some commits today. Its not working, but would give you a direction.
Its based on Spooled Dir Source, using JetS3t. I am stuck at
ResettableS3ObjectInputStream, trying to make it seekable, but it won't work in
its current form. If the connection breaks during reading, we have to read all
the bytes to come to the position. Have a look at it, I won't be touching it
till tomm. If you update anything, please send a PR. In Zk, we need to store
the file name plus marker (place till which events were processed)
Well if we don't rename files, we can't apply filter. Do you see customer's
having thousand's of files in single bucket? Getting a big list during each
list API call is a concern. Anyways these are optimisations, lets get it
working 1st :)
In a nut shell, we need to get following working
1. ResettableS3ObjectInputStream needs fixing. Wanted to keep the channel based
implementation intact, but that's not possible till we move to JDK 8 where we
have SeekableByteChannel. Here we just need to fix the refillBuffer() API
2. Get to the main flow
3. Support multiple buckets, perhaps add abstraction like BucketInfo
If we fix #1, we can get into main flow of seeing the Source in Action and then
can work on refining it to production grade.
> S3 Source
> ---------
>
> Key: FLUME-2437
> URL: https://issues.apache.org/jira/browse/FLUME-2437
> Project: Flume
> Issue Type: New Feature
> Reporter: Jonathan Natkins
> Assignee: Ashish Paliwal
>
> There have been multiple requests on the mailing list for an S3 source
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)