Re: [Discuss] Storm hdfs spout improvements

Roshan Naik Mon, 13 Feb 2017 15:23:37 -0800


On 2/13/17, 12:14 PM, "Sachin Pasalkar" <[email protected]>
wrote:

>I have attached updated source code of HDFSSpout for more reference. I
>have updated respective classes (not attached)

Don¹t see any attachment. Answers are below. Better to do this discussion
on a JIRA.

On 2/13/17, 8:32 AM, "Sachin Pasalkar" <[email protected]>
wrote:

>Hi,
>
>I was looking at storm hdfs spout code in 1.x branch, I found below
>improvements can be made in below code.
>
>  1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
>that it can be used in generics.

Java generics and making a class public are unrelated to my knowledge. But
making it public sounds ok to me if its useful for "user defined² readers
Š although it doesn¹t really have that much going on in it. For future
built-in reader types it is immaterial as they can derive from it anyway
just like the existing ones. HdfsSpout class itself doesn¹t care about the
ŒAbstractFileReader¹ type. For that there is the ŒFileReader¹ interface.

>  2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
>String. It will be great to have class<? extends AbstractFileReader>
>readerType; So we will not use Class.forName at multiple places also it
>will help in below point.

The reason it is a string, is that, for built-in readers,  we wanted to
support Œshort aliases¹ like Œtext¹ and Œseq¹ instead of FQCN..

>  3.  HdfsSpout also needs to provide outFields which are declared as
>constants in each reader(e.g.SequenceFileReader). We can have abstract
>API AbstractFileReader in which return them to user to make it generic.

These consts can¹t go into the AbstractFileReader as they are reader
specific.

They are there just for convenience.  Users can call withOutputFields() on
the spout and set it to these predefined names or anything else.

-Roshan

Re: [Discuss] Storm hdfs spout improvements

Reply via email to