Yes, but in patches and in Issue Hama-531, so we can review. 2012/12/10 Edward J. Yoon <[email protected]>
> We talked on gtalk, the conclusion is as below: > > "If there's no opinion, I'll remove VertexInputReader in > GraphJobRunner, because it make code complex. Let's consider again > about the VertexInputReader, after fixing HAMA-531 and HAMA-632 > issues." > > I'll clean up them tomorrow. > > On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon <[email protected]> > wrote: > > Hi Edward, I am assuming that you want to do this because you want to run > > the job using more BSP tasks in parallel to reduce the memory usage per > > task and perhaps run it faster. > > Am I right? I am +1 if this makes things faster. However this would be > > expensive for people with smaller clusters, and we should have spill, > cache > > and lookup implemented for Vertices in such cases. > > > > Regarding backward compatibility, can we use the user's VertexInputReader > > to read the data and then write them in sequential file format we wan't. > I > > was discussing this with Thomas and we felt this could be done by > > configuring a default input reader and overriding the same by > > configuration. We would have to make the Vertex class Writable. I would > > like to keep it backward compatible. Is this a possibility? > > > > Regarding run-time partitioning, not all partitioning would be based on > > hash partitioning. I can have a partitioner based on color of the vertex > or > > some other property of the vertex. It is a step we can skip if not > > configured by user. > > > > Just my 2 cents. We can deprecate things but let's not remove > immediately. > > > > -Suraj > > > > HAMA-632 can wait until everything is resolved. I am trying to reduce the > > API complexity. > > > > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut > > <[email protected]>wrote: > > > >> You didn't get the use of the reader. > >> The reader doesn't care about the input format. > >> It just takes the input as Writable, so for Text this is > LongWritable/Text > >> pairs. For NoSQL this might be LongWritable/BytesWritable. > >> > >> It's up to you coding this for your input sequence, not for each format. > >> This is not hardcoded to text, only in the examples. > >> > >> 2012/12/10 Edward J. Yoon <[email protected]> > >> > >> > Again ... User can create their own InputFormatter to read records as > >> > a <Writable, ArrayWritable> from text file or sequence file, or > >> > NoSQLs. > >> > > >> > You can use K, V pairs and sequence file. Why do you want to use text > >> > file? Should I always write text file and parse them using > >> > VertexInputReader? > >> > > >> > > >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut > >> > <[email protected]> wrote: > >> > >> > >> > >> It's a gap in experience, Thomas. > >> > > > >> > > > >> > > Most probably you should read some good books on data extraction and > >> then > >> > > choose your tools accordingly. > >> > > I never think that BSP is and will be a good extraction technique > for > >> > > unstructured data. > >> > > > >> > > But these are just my two cents here- there seems to be somewhat > more > >> > > political problems in this game than using tools appropriately. > >> > > > >> > > 2012/12/10 Thomas Jungblut <[email protected]> > >> > > > >> > >> Yes, if you preprocess your data correctly. > >> > >> I have done the same unstructured extraction with the movie > database > >> > from > >> > >> IMDB and it worked fine. > >> > >> That's just not a job for BSP, but for MapReduce. > >> > >> > >> > >> 2012/12/10 Edward J. Yoon <[email protected]> > >> > >> > >> > >>> It's a gap in experience, Thomas. Do you think you can extract > >> Twitter > >> > >>> > >> > >>> mention graph using parseVertex? > >> > >>> > >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut > >> > >>> <[email protected]> wrote: > >> > >>> > I have trouble understanding you here. > >> > >>> > > >> > >>> > How can I generate large sample without coding? > >> > >>> > > >> > >>> > > >> > >>> > Do you mean random data generation or real-life data? > >> > >>> > Personally I think it is really convenient to transform > >> unstructured > >> > >>> data > >> > >>> > in a text file to vertices. > >> > >>> > > >> > >>> > > >> > >>> > 2012/12/10 Edward <[email protected]> > >> > >>> > > >> > >>> >> I mean, With or without input reader. How can I generate large > >> > sample > >> > >>> >> without coding? > >> > >>> >> > >> > >>> >> It's unnecessary feature. As I mentioned before, only good for > >> > simple > >> > >>> and > >> > >>> >> small test. > >> > >>> >> > >> > >>> >> Sent from my iPhone > >> > >>> >> > >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut < > >> > >>> [email protected]> > >> > >>> >> wrote: > >> > >>> >> > >> > >>> >> >> > >> > >>> >> >> In my case, generating test data is very annoying. > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > Really? What is so difficult to generate tab separated text > >> > data?;) > >> > >>> >> > I think we shouldn't do this, but there seems to be very > little > >> > >>> interest > >> > >>> >> in > >> > >>> >> > the community so I will not block your work on it. > >> > >>> >> > > >> > >>> >> > Good luck ;) > >> > >>> >> > >> > >>> > >> > >>> > >> > >>> > >> > >>> -- > >> > >>> Best Regards, Edward J. Yoon > >> > >>> @eddieyoon > >> > >>> > >> > >> > >> > >> > >> > > >> > > >> > > >> > -- > >> > Best Regards, Edward J. Yoon > >> > @eddieyoon > >> > > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
