That is great Simone! I have not tried your suggestion yet, but will surely try it.
@Robert, thank you I will try that option too. On Mon, Jun 25, 2012 at 3:12 PM, Simone Leo <sim...@crs4.it> wrote: > Hello, > > we recently added a tool for solving relatively simple problems like this > one to Pydoop. The tool is called Pydoop Script: > > http://pydoop.sourceforge.net/**docs/pydoop_script.html#**pydoop-script<http://pydoop.sourceforge.net/docs/pydoop_script.html#pydoop-script> > > Using Pydoop Script, I implemented the transposer in 14 lines of code: > > import struct > > def mapper(key, value, writer): > value = value.split() > for i, a in enumerate(value): > writer.emit(struct.pack(">q", i), "%s\t%s" % (key, a)) > > def reducer(key, ivalue, writer): > vector = [] > for v in ivalue: > v = v.split("\t") > v[0] = struct.unpack(">q", v[0])[0] > vector.append(v) > vector.sort() > vector = [v[1] for v in vector] > writer.emit(struct.unpack(">q"**, key)[0], "\t".join(vector)) > > Here is the complete workflow: > > hadoop fs -put matrix.txt{,} > pydoop script transpose.py matrix.txt t_matrix > hadoop fs -get t_matrix{,} > sort -mn -k1,1 -o t_matrix.txt t_matrix/part-0000* > > The final t_matrix.txt actually contains an additional first column with > row indexes that should be removed (but this can probably be avoided if the > transposed matrix acts as input for another job). Although the above > implementation can be improved in several ways, it took me just about 30 > minutes to write and test after seeing your message. > > Cheers > > Simone > > > On 06/21/2012 10:16 AM, Subir S wrote: > >> Hi, >> >> Is it possible to implement transpose operation of rows into columns and >> vice versa... >> >> >> i.e. >> >> col1 col2 col3 >> col4 col5 col6 >> col7 col8 col9 >> col10 col11 col12 >> >> can this be converted to >> >> col1 col4 col7 col10 >> col2 col5 col8 col11 >> col3 col6 col9 col12 >> >> Is this even possible with map reduce? If yes, which language helps to >> achieve this faster? >> >> Thanks >> >> > -- > Simone Leo > Data Fusion - Distributed Computing > CRS4 > POLARIS - Building #1 > Piscina Manna > I-09010 Pula (CA) - Italy > e-mail: simone....@crs4.it > http://www.crs4.it > > >