On 4 January 2011 16:16, Josep M. Fontana <[email protected]> wrote: > Sorry if you receive this message more than once. I tried to send it > from another address and it bounced saying I was not a member of the > list. > ------------ > > Hi, >
Hi. > I was trying to convert an Apertium formatted dictionary into a format > usable by Freeling. From what I saw in > <http://wiki.apertium.org/wiki/Freeling> the tool to do that is a script > called dix-to-maco.py'. I've tried the conversion but I'm having no > luck. I wonder whether someone in this list can help me. > > OK, here's what I did. I have a file called > 'apertium-oldca-XX.oldca.dix'. Following instructions from Mikel > Forcada, I used 'lt-expand' from 'lttoolbox' to expand the dictionary > which I piped into a file called dict.txt. I put this file together with > the file 'es-tags.parole.txt' in the same directory and I ran > 'dix-to-maco.py' as follows: > > $./dix-to-maco.py -l dict.txt es-tags.parole.txt > 'es-tags.parole.txt' will not work. 'oldca' is not part of Apertium, per se, and uses its own tagset. You will need to supply a mapping of that tagset to Parole to get usable output. > > The program ran apparently without any problems (no message errors) but > nothing happened. I don't see any new file produced as output and > nothing has changed in the dict-txt file. I have not been able to find > any documentation for 'dix-to-maco.py' It's right there, in the file: # # This is a conversion script for the format outputted by lt-expand to the # format accepted by the freeling indexdict program. # # Input is an expanded Apertium dictionary: # # tadoù:tad<n><m><sg> # # Output is a Freeling dictionary: # # tadoù tag NCMPV0 # # To convert the Apertium tagset into a PAROLE-compatible tagset, a file # with the parole tag and Apertium tag list is used. The two are separated # by a tab: # # NCMPV0 <n><m><sg> # > and so I don't now whether I have > to use any particular syntax to obtain an output file with the converted > format for the dictionary. I tried changing -l for -m or -n with the > same results. I don't really know what these parameters do but I just tried. > > Then I realized that if I did '$python dix-to-maco.py' I got the message: > > Usage: ./dix-to-maco.py [-l|-m|-n]<dix file> <parole lookup> > > > So I tried with the name of the original .dix file (not expanded with > lt-expand) but I also got a bunch of errors: > > > > jfont...@ubuntu:~/Downloads/apertium-oldca-XX-0.7$ ./dix-to-maco.py -l > apertium-oldca-XX.oldca.dix es-tags.parole.txt > Traceback (most recent call last): > File "./dix-to-maco.py", line 249, in<module> > print key + ' ' + analyses; > File "/usr/lib/python2.6/codecs.py", line 351, in write > data, consumed = self.encode(object, self.errors) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: > ordinal not in range(128) That's the mating call of 'Badly Written Python'/'Python Sucks At Unicode'. I'm having a look at it. > > ----------------------- > > jfont...@ubuntu:~/Downloads/apertium-oldca-XX-0.7$ ./dix-to-maco.py -m > apertium-oldca-XX.oldca.dix es-tags.parole.txt > casem-s'ho casar+es+ho > Traceback (most recent call last): > File "./dix-to-maco.py", line 249, in<module> > print key + ' ' + analyses; > File "/usr/lib/python2.6/codecs.py", line 351, in write > data, consumed = self.encode(object, self.errors) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: > ordinal not in range(128) > > -------------------------- > > > jfont...@ubuntu:~/Downloads/apertium-oldca-XX-0.7$ ./dix-to-maco.py -n > apertium-oldca-XX.oldca.dix es-tags.parole.txt > Nuakchott > Traceback (most recent call last): > File "./dix-to-maco.py", line 249, in<module> > print key + ' ' + analyses; > File "/usr/lib/python2.6/codecs.py", line 351, in write > data, consumed = self.encode(object, self.errors) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: > ordinal not in range(128) > > > --- > > Finally I tried by changing the extension of the file created with > lt-expand to .dix, which also gave me an error. > > jfont...@ubuntu:~/Downloads/apertium-oldca-XX-0.7$ ./dix-to-maco.py -lm > dict-freeling.txt es-tags.parole.txt > Traceback (most recent call last): > File "./dix-to-maco.py", line 222, in<module> > analysis = row[1].strip(); > IndexError: list index out of range > > ---------- > > Obviously I'm doing something wrong. Could anybody lend me a hand with > this? Thanks in advance. > > > Josep M. > > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
