You didn't get the use of the reader. The reader doesn't care about the input format. It just takes the input as Writable, so for Text this is LongWritable/Text pairs. For NoSQL this might be LongWritable/BytesWritable.
It's up to you coding this for your input sequence, not for each format. This is not hardcoded to text, only in the examples. 2012/12/10 Edward J. Yoon <[email protected]> > Again ... User can create their own InputFormatter to read records as > a <Writable, ArrayWritable> from text file or sequence file, or > NoSQLs. > > You can use K, V pairs and sequence file. Why do you want to use text > file? Should I always write text file and parse them using > VertexInputReader? > > > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut > <[email protected]> wrote: > >> > >> It's a gap in experience, Thomas. > > > > > > Most probably you should read some good books on data extraction and then > > choose your tools accordingly. > > I never think that BSP is and will be a good extraction technique for > > unstructured data. > > > > But these are just my two cents here- there seems to be somewhat more > > political problems in this game than using tools appropriately. > > > > 2012/12/10 Thomas Jungblut <[email protected]> > > > >> Yes, if you preprocess your data correctly. > >> I have done the same unstructured extraction with the movie database > from > >> IMDB and it worked fine. > >> That's just not a job for BSP, but for MapReduce. > >> > >> 2012/12/10 Edward J. Yoon <[email protected]> > >> > >>> It's a gap in experience, Thomas. Do you think you can extract Twitter > >>> > >>> mention graph using parseVertex? > >>> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut > >>> <[email protected]> wrote: > >>> > I have trouble understanding you here. > >>> > > >>> > How can I generate large sample without coding? > >>> > > >>> > > >>> > Do you mean random data generation or real-life data? > >>> > Personally I think it is really convenient to transform unstructured > >>> data > >>> > in a text file to vertices. > >>> > > >>> > > >>> > 2012/12/10 Edward <[email protected]> > >>> > > >>> >> I mean, With or without input reader. How can I generate large > sample > >>> >> without coding? > >>> >> > >>> >> It's unnecessary feature. As I mentioned before, only good for > simple > >>> and > >>> >> small test. > >>> >> > >>> >> Sent from my iPhone > >>> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut < > >>> [email protected]> > >>> >> wrote: > >>> >> > >>> >> >> > >>> >> >> In my case, generating test data is very annoying. > >>> >> > > >>> >> > > >>> >> > Really? What is so difficult to generate tab separated text > data?;) > >>> >> > I think we shouldn't do this, but there seems to be very little > >>> interest > >>> >> in > >>> >> > the community so I will not block your work on it. > >>> >> > > >>> >> > Good luck ;) > >>> >> > >>> > >>> > >>> > >>> -- > >>> Best Regards, Edward J. Yoon > >>> @eddieyoon > >>> > >> > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
