Sorry, text is exception. On Tue, Dec 11, 2012 at 4:52 AM, Edward J. Yoon <[email protected]> wrote: > Again ... User can create their own InputFormatter to read records as > a <Writable, ArrayWritable> from text file or sequence file, or > NoSQLs. > > You can use K, V pairs and sequence file. Why do you want to use text > file? Should I always write text file and parse them using > VertexInputReader? > > > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut > <[email protected]> wrote: >>> >>> It's a gap in experience, Thomas. >> >> >> Most probably you should read some good books on data extraction and then >> choose your tools accordingly. >> I never think that BSP is and will be a good extraction technique for >> unstructured data. >> >> But these are just my two cents here- there seems to be somewhat more >> political problems in this game than using tools appropriately. >> >> 2012/12/10 Thomas Jungblut <[email protected]> >> >>> Yes, if you preprocess your data correctly. >>> I have done the same unstructured extraction with the movie database from >>> IMDB and it worked fine. >>> That's just not a job for BSP, but for MapReduce. >>> >>> 2012/12/10 Edward J. Yoon <[email protected]> >>> >>>> It's a gap in experience, Thomas. Do you think you can extract Twitter >>>> >>>> mention graph using parseVertex? >>>> >>>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut >>>> <[email protected]> wrote: >>>> > I have trouble understanding you here. >>>> > >>>> > How can I generate large sample without coding? >>>> > >>>> > >>>> > Do you mean random data generation or real-life data? >>>> > Personally I think it is really convenient to transform unstructured >>>> data >>>> > in a text file to vertices. >>>> > >>>> > >>>> > 2012/12/10 Edward <[email protected]> >>>> > >>>> >> I mean, With or without input reader. How can I generate large sample >>>> >> without coding? >>>> >> >>>> >> It's unnecessary feature. As I mentioned before, only good for simple >>>> and >>>> >> small test. >>>> >> >>>> >> Sent from my iPhone >>>> >> >>>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut < >>>> [email protected]> >>>> >> wrote: >>>> >> >>>> >> >> >>>> >> >> In my case, generating test data is very annoying. >>>> >> > >>>> >> > >>>> >> > Really? What is so difficult to generate tab separated text data?;) >>>> >> > I think we shouldn't do this, but there seems to be very little >>>> interest >>>> >> in >>>> >> > the community so I will not block your work on it. >>>> >> > >>>> >> > Good luck ;) >>>> >> >>>> >>>> >>>> >>>> -- >>>> Best Regards, Edward J. Yoon >>>> @eddieyoon >>>> >>> >>> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon
-- Best Regards, Edward J. Yoon @eddieyoon
