I also had the same problem, and I am also interested by the
modifications to make to avoid the segmentation fault.
Since when I tried it was for a simple test and I didnt bother
correcting, I made this script, which you might use also. It splits the
input into chunks of 2500 lines, It is used like this:
<file split-file-wrapper.py 2500 parse-en-collins >outfile
(But it makes the processing much slower, modifying the source would be
better).
--
Raphael Payen
On Fri, 2010-10-15 at 14:40 +0200, marco turchi wrote:
> Hi
> I have the same problem with the Collins' parser. Do u know exactly
> what I need to change in the source code of the parser? or u have a
> modified version?
>
> Thanks a lot
> Marco
>
> On Thu, Jun 3, 2010 at 5:03 PM, Hwidong Na <[email protected]>
> wrote:
> Hi,
>
> This is not because of the wrapper script, but the Collins'
> parser. You
> can modify the source to iterate the read_sentences function
> in the file
> "main.c". In addition, you need to modify defined values in
> "grammar.h"
> to avoid segmentation faults of long sentences.
>
> --
> Hwidong Na <[email protected]>
> KLE lab, POSTECH, KOREA
>
>
> 2010-05-27 (목), 19:20 +0800, dongxinghua0213:
>
> > hello,
> > when parsing sentences using parse-en-collins.perl,I find
> only 2500
> > parsed sentences are available ,but the number of sentences
> are more
> > than one hundred thousand , what can I do to parse all
> sentences ?
> >
> > thank you !
> >
> >
> >
> >
> ______________________________________________________________________
> > 网易为中小企业免费提供企业邮箱(自主域名)
>
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
#!/usr/bin/python
# Copyright Alpha CRC Ltd. distributed as GPL
import sys,os,subprocess
def usage():
sys.stdout.write("Runs <command> repetitively, splitting input in chunks of <num-lines>\nUsage: "+os.path.basename(sys.argv[0])+" <num-lines> <command>\n")
sys.exit(1)
if not len(sys.argv) >= 3:
usage()
chunksize=int(sys.argv[1])
command=sys.argv[2:]
if not chunksize > 0:
sys.stdout.write("invalid num\n")
sys.exit(1)
def init_processpipe(c):
proc = subprocess.Popen(c, stdin=subprocess.PIPE, stdout=sys.stdout, stderr=open(os.devnull,"w"))
return proc
def communicate_and_check(p):
p.communicate()
if (p.returncode != 0):
import os
print >> sys.stderr, "Stopped in line "+str(numlines)+" of iteration "+str(numiter)+" (source line "+str(numiter*chunksize+numlines)+") with error: "+str(p.returncode)#+ " - "+os.strerror(p.returncode)
sys.exit(p.returncode)
sys.stdout.flush()
# print
process = init_processpipe(command)
numlines=0
numiter = 0
for line in sys.stdin:
if (numlines == chunksize):
communicate_and_check(process)
process = init_processpipe(command)
numlines=0
numiter+=1
process.stdin.write(line)
numlines +=1
communicate_and_check(process)
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support