If you can fix BSPJobClient.partition() method to partition text input, please do.
Again ... :/ >>> * If we have VertexInputReader again, we don't need to apply it to all >>> examples. And, random generators and examples should be managed >>> together now. As we discussed, I'll clean up them tomorrow. On Tue, Dec 11, 2012 at 7:21 AM, Edward J. Yoon <[email protected]> wrote: >> Please do me a favor a code how you want the partitioning BSP job to work >> before removing everything. I will tell you how to use the readers without >> any graph duplicate code so you don't need to touch the examples at all. > > You don't need to wait. Because it will be almost same with > BSPJobClient.partition() method. > > On Tue, Dec 11, 2012 at 6:59 AM, Thomas Jungblut > <[email protected]> wrote: >> Please do me a favor a code how you want the partitioning BSP job to work >> before removing everything. I will tell you how to use the readers without >> any graph duplicate code so you don't need to touch the examples at all. >> >> 2012/12/10 Edward J. Yoon <[email protected]> >> >>> Please review >>> https://issues.apache.org/jira/secure/attachment/12560155/patch_v02.txt >>> first. >>> >>> * If we have VertexInputReader again, we don't need to apply it to all >>> examples. And, random generators and examples should be managed >>> together now. >>> >>> On Tue, Dec 11, 2012 at 6:52 AM, Thomas Jungblut >>> <[email protected]> wrote: >>> > Yes, but in patches and in Issue Hama-531, so we can review. >>> > >>> > 2012/12/10 Edward J. Yoon <[email protected]> >>> > >>> >> We talked on gtalk, the conclusion is as below: >>> >> >>> >> "If there's no opinion, I'll remove VertexInputReader in >>> >> GraphJobRunner, because it make code complex. Let's consider again >>> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632 >>> >> issues." >>> >> >>> >> I'll clean up them tomorrow. >>> >> >>> >> On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon <[email protected]> >>> >> wrote: >>> >> > Hi Edward, I am assuming that you want to do this because you want to >>> run >>> >> > the job using more BSP tasks in parallel to reduce the memory usage >>> per >>> >> > task and perhaps run it faster. >>> >> > Am I right? I am +1 if this makes things faster. However this would be >>> >> > expensive for people with smaller clusters, and we should have spill, >>> >> cache >>> >> > and lookup implemented for Vertices in such cases. >>> >> > >>> >> > Regarding backward compatibility, can we use the user's >>> VertexInputReader >>> >> > to read the data and then write them in sequential file format we >>> wan't. >>> >> I >>> >> > was discussing this with Thomas and we felt this could be done by >>> >> > configuring a default input reader and overriding the same by >>> >> > configuration. We would have to make the Vertex class Writable. I >>> would >>> >> > like to keep it backward compatible. Is this a possibility? >>> >> > >>> >> > Regarding run-time partitioning, not all partitioning would be based >>> on >>> >> > hash partitioning. I can have a partitioner based on color of the >>> vertex >>> >> or >>> >> > some other property of the vertex. It is a step we can skip if not >>> >> > configured by user. >>> >> > >>> >> > Just my 2 cents. We can deprecate things but let's not remove >>> >> immediately. >>> >> > >>> >> > -Suraj >>> >> > >>> >> > HAMA-632 can wait until everything is resolved. I am trying to reduce >>> the >>> >> > API complexity. >>> >> > >>> >> > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut >>> >> > <[email protected]>wrote: >>> >> > >>> >> >> You didn't get the use of the reader. >>> >> >> The reader doesn't care about the input format. >>> >> >> It just takes the input as Writable, so for Text this is >>> >> LongWritable/Text >>> >> >> pairs. For NoSQL this might be LongWritable/BytesWritable. >>> >> >> >>> >> >> It's up to you coding this for your input sequence, not for each >>> format. >>> >> >> This is not hardcoded to text, only in the examples. >>> >> >> >>> >> >> 2012/12/10 Edward J. Yoon <[email protected]> >>> >> >> >>> >> >> > Again ... User can create their own InputFormatter to read records >>> as >>> >> >> > a <Writable, ArrayWritable> from text file or sequence file, or >>> >> >> > NoSQLs. >>> >> >> > >>> >> >> > You can use K, V pairs and sequence file. Why do you want to use >>> text >>> >> >> > file? Should I always write text file and parse them using >>> >> >> > VertexInputReader? >>> >> >> > >>> >> >> > >>> >> >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut >>> >> >> > <[email protected]> wrote: >>> >> >> > >> >>> >> >> > >> It's a gap in experience, Thomas. >>> >> >> > > >>> >> >> > > >>> >> >> > > Most probably you should read some good books on data extraction >>> and >>> >> >> then >>> >> >> > > choose your tools accordingly. >>> >> >> > > I never think that BSP is and will be a good extraction technique >>> >> for >>> >> >> > > unstructured data. >>> >> >> > > >>> >> >> > > But these are just my two cents here- there seems to be somewhat >>> >> more >>> >> >> > > political problems in this game than using tools appropriately. >>> >> >> > > >>> >> >> > > 2012/12/10 Thomas Jungblut <[email protected]> >>> >> >> > > >>> >> >> > >> Yes, if you preprocess your data correctly. >>> >> >> > >> I have done the same unstructured extraction with the movie >>> >> database >>> >> >> > from >>> >> >> > >> IMDB and it worked fine. >>> >> >> > >> That's just not a job for BSP, but for MapReduce. >>> >> >> > >> >>> >> >> > >> 2012/12/10 Edward J. Yoon <[email protected]> >>> >> >> > >> >>> >> >> > >>> It's a gap in experience, Thomas. Do you think you can extract >>> >> >> Twitter >>> >> >> > >>> >>> >> >> > >>> mention graph using parseVertex? >>> >> >> > >>> >>> >> >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut >>> >> >> > >>> <[email protected]> wrote: >>> >> >> > >>> > I have trouble understanding you here. >>> >> >> > >>> > >>> >> >> > >>> > How can I generate large sample without coding? >>> >> >> > >>> > >>> >> >> > >>> > >>> >> >> > >>> > Do you mean random data generation or real-life data? >>> >> >> > >>> > Personally I think it is really convenient to transform >>> >> >> unstructured >>> >> >> > >>> data >>> >> >> > >>> > in a text file to vertices. >>> >> >> > >>> > >>> >> >> > >>> > >>> >> >> > >>> > 2012/12/10 Edward <[email protected]> >>> >> >> > >>> > >>> >> >> > >>> >> I mean, With or without input reader. How can I generate >>> large >>> >> >> > sample >>> >> >> > >>> >> without coding? >>> >> >> > >>> >> >>> >> >> > >>> >> It's unnecessary feature. As I mentioned before, only good >>> for >>> >> >> > simple >>> >> >> > >>> and >>> >> >> > >>> >> small test. >>> >> >> > >>> >> >>> >> >> > >>> >> Sent from my iPhone >>> >> >> > >>> >> >>> >> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut < >>> >> >> > >>> [email protected]> >>> >> >> > >>> >> wrote: >>> >> >> > >>> >> >>> >> >> > >>> >> >> >>> >> >> > >>> >> >> In my case, generating test data is very annoying. >>> >> >> > >>> >> > >>> >> >> > >>> >> > >>> >> >> > >>> >> > Really? What is so difficult to generate tab separated >>> text >>> >> >> > data?;) >>> >> >> > >>> >> > I think we shouldn't do this, but there seems to be very >>> >> little >>> >> >> > >>> interest >>> >> >> > >>> >> in >>> >> >> > >>> >> > the community so I will not block your work on it. >>> >> >> > >>> >> > >>> >> >> > >>> >> > Good luck ;) >>> >> >> > >>> >> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> > >>> -- >>> >> >> > >>> Best Regards, Edward J. Yoon >>> >> >> > >>> @eddieyoon >>> >> >> > >>> >>> >> >> > >> >>> >> >> > >> >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > Best Regards, Edward J. Yoon >>> >> >> > @eddieyoon >>> >> >> > >>> >> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Best Regards, Edward J. Yoon >>> >> @eddieyoon >>> >> >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
