Again ... User can create their own InputFormatter to read records as a <Writable, ArrayWritable> from text file or sequence file, or NoSQLs.
You can use K, V pairs and sequence file. Why do you want to use text file? Should I always write text file and parse them using VertexInputReader? On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut <[email protected]> wrote: >> >> It's a gap in experience, Thomas. > > > Most probably you should read some good books on data extraction and then > choose your tools accordingly. > I never think that BSP is and will be a good extraction technique for > unstructured data. > > But these are just my two cents here- there seems to be somewhat more > political problems in this game than using tools appropriately. > > 2012/12/10 Thomas Jungblut <[email protected]> > >> Yes, if you preprocess your data correctly. >> I have done the same unstructured extraction with the movie database from >> IMDB and it worked fine. >> That's just not a job for BSP, but for MapReduce. >> >> 2012/12/10 Edward J. Yoon <[email protected]> >> >>> It's a gap in experience, Thomas. Do you think you can extract Twitter >>> >>> mention graph using parseVertex? >>> >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut >>> <[email protected]> wrote: >>> > I have trouble understanding you here. >>> > >>> > How can I generate large sample without coding? >>> > >>> > >>> > Do you mean random data generation or real-life data? >>> > Personally I think it is really convenient to transform unstructured >>> data >>> > in a text file to vertices. >>> > >>> > >>> > 2012/12/10 Edward <[email protected]> >>> > >>> >> I mean, With or without input reader. How can I generate large sample >>> >> without coding? >>> >> >>> >> It's unnecessary feature. As I mentioned before, only good for simple >>> and >>> >> small test. >>> >> >>> >> Sent from my iPhone >>> >> >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut < >>> [email protected]> >>> >> wrote: >>> >> >>> >> >> >>> >> >> In my case, generating test data is very annoying. >>> >> > >>> >> > >>> >> > Really? What is so difficult to generate tab separated text data?;) >>> >> > I think we shouldn't do this, but there seems to be very little >>> interest >>> >> in >>> >> > the community so I will not block your work on it. >>> >> > >>> >> > Good luck ;) >>> >> >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> >> >> -- Best Regards, Edward J. Yoon @eddieyoon
