That is great Simone! I have not tried your suggestion yet, but will surely
try it.

@Robert, thank you I will try that option too.

On Mon, Jun 25, 2012 at 3:12 PM, Simone Leo <sim...@crs4.it> wrote:

> Hello,
>
> we recently added a tool for solving relatively simple problems like this
> one to Pydoop. The tool is called Pydoop Script:
>
> http://pydoop.sourceforge.net/**docs/pydoop_script.html#**pydoop-script<http://pydoop.sourceforge.net/docs/pydoop_script.html#pydoop-script>
>
> Using Pydoop Script, I implemented the transposer in 14 lines of code:
>
> import struct
>
> def mapper(key, value, writer):
>  value = value.split()
>  for i, a in enumerate(value):
>    writer.emit(struct.pack(">q", i), "%s\t%s" % (key, a))
>
> def reducer(key, ivalue, writer):
>  vector = []
>  for v in ivalue:
>    v = v.split("\t")
>    v[0] = struct.unpack(">q", v[0])[0]
>    vector.append(v)
>  vector.sort()
>  vector = [v[1] for v in vector]
>  writer.emit(struct.unpack(">q"**, key)[0], "\t".join(vector))
>
> Here is the complete workflow:
>
> hadoop fs -put matrix.txt{,}
> pydoop script transpose.py matrix.txt t_matrix
> hadoop fs -get t_matrix{,}
> sort -mn -k1,1 -o t_matrix.txt t_matrix/part-0000*
>
> The final t_matrix.txt actually contains an additional first column with
> row indexes that should be removed (but this can probably be avoided if the
> transposed matrix acts as input for another job). Although the above
> implementation can be improved in several ways, it took me just about 30
> minutes to write and test after seeing your message.
>
> Cheers
>
> Simone
>
>
> On 06/21/2012 10:16 AM, Subir S wrote:
>
>> Hi,
>>
>> Is it possible to implement transpose operation of rows into columns and
>> vice versa...
>>
>>
>> i.e.
>>
>> col1 col2 col3
>> col4 col5 col6
>> col7 col8 col9
>> col10 col11 col12
>>
>> can this be converted to
>>
>> col1 col4 col7 col10
>> col2 col5 col8 col11
>> col3 col6 col9 col12
>>
>> Is this even possible with map reduce? If yes, which language helps to
>> achieve this faster?
>>
>> Thanks
>>
>>
> --
> Simone Leo
> Data Fusion - Distributed Computing
> CRS4
> POLARIS - Building #1
> Piscina Manna
> I-09010 Pula (CA) - Italy
> e-mail: simone....@crs4.it
> http://www.crs4.it
>
>
>

Reply via email to