Hi,
You don't need to know the total number of the lines in advance. I
attached my modified version of "main.c", downsizing the number of
sentences in memory for each iteration to 250. I tested it using
"examples/sec??.tagged", and the both input file is parsed without any
segmentation fault.
Best regards,
--
Hwidong Na <[email protected]>
KLE lab, POSTECH, KOREA
2010-10-15 (금), 16:02 +0200, marco turchi:
> Hi
> thanks, I'm trying to modify the main.c in a way that it reads the
> file twice, the first time to get the number of lines and the second
> to run the parser. It is not the best solution, but if ti works it can
> solve the problem.
>
> I do not yet take in account the segmentation fault.
>
> thanks
> Marco
>
> On Fri, Oct 15, 2010 at 3:49 PM, Raphael Payen <[email protected]>
> wrote:
> I also had the same problem, and I am also interested by the
> modifications to make to avoid the segmentation fault.
>
> Since when I tried it was for a simple test and I didnt bother
> correcting, I made this script, which you might use also. It
> splits the
> input into chunks of 2500 lines, It is used like this:
> <file split-file-wrapper.py 2500 parse-en-collins >outfile
> (But it makes the processing much slower, modifying the source
> would be
> better).
>
> --
> Raphael Payen
>
>
>
> On Fri, 2010-10-15 at 14:40 +0200, marco turchi wrote:
> > Hi
> > I have the same problem with the Collins' parser. Do u know
> exactly
> > what I need to change in the source code of the parser? or u
> have a
> > modified version?
> >
> > Thanks a lot
> > Marco
> >
> > On Thu, Jun 3, 2010 at 5:03 PM, Hwidong Na
> <[email protected]>
> > wrote:
> > Hi,
> >
> > This is not because of the wrapper script, but the
> Collins'
> > parser. You
> > can modify the source to iterate the read_sentences
> function
> > in the file
> > "main.c". In addition, you need to modify defined
> values in
> > "grammar.h"
> > to avoid segmentation faults of long sentences.
> >
> > --
> > Hwidong Na <[email protected]>
> > KLE lab, POSTECH, KOREA
> >
> >
> > 2010-05-27 (목), 19:20 +0800, dongxinghua0213:
> >
> > > hello,
> > > when parsing sentences using
> parse-en-collins.perl,I find
> > only 2500
> > > parsed sentences are available ,but the number of
> sentences
> > are more
> > > than one hundred thousand , what can I do to
> parse all
> > sentences ?
> > >
> > > thank you !
> > >
> > >
> > >
> > >
> >
> ______________________________________________________________________
> > > 网易为中小企业免费提供企业邮箱(自主域名)
> >
> > > _______________________________________________
> > > Moses-support mailing list
> > > [email protected]
> > >
> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> >
> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
/* This code is the statistical natural language parser described in
M. Collins. 1999. Head-Driven
Statistical Models for Natural Language Parsing. PhD Dissertation,
University of Pennsylvania.
Copyright (C) 1999 Michael Collins
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
#include <assert.h>
#include "lexicon.h"
#include "grammar.h"
#include "mymalloc.h"
#include "mymalloc_char.h"
#include "hash.h"
#include "prob.h"
#include "readevents.h"
#include "sentence.h"
#include "chart.h"
#define BUFSIZE 250
sentence_type sentences[BUFSIZE];
int main(int argc, char *argv[])
{
int s;
int numsentences;
FILE *words;
char grammar[1000];
char buffer[1000];
float temp;
int npflag;
time_t g_time;
time_t s_time;
if(argc!=8)
{
fprintf(stderr,"ERROR in command line, usage:\n cat countsfile | parser.out sentences-file grammarfile beamsize punctuation-flag distaflag distvflag npflag\n");
return 0;
}
sscanf(argv[1],"%s",buffer);
words=fopen(buffer,"r");
assert(words!=NULL);
sscanf(argv[2],"%s",grammar);
sscanf(argv[3],"%f",&temp);
BEAMPROB = log(temp);
sscanf(argv[4],"%d",&PUNC_FLAG);
sscanf(argv[5],"%d",&DISTAFLAG);
sscanf(argv[6],"%d",&DISTVFLAG);
sscanf(argv[7],"%d",&npflag);
assert(npflag==0 || npflag==1);
set_treebankoutputflag(npflag);
mymalloc_init();
mymalloc_char_init();
hash_make_table(8000007,&new_hash);
effhash_make_table(1000003,&eff_hash);
read_grammar(grammar);
// numsentences=read_sentences(words,sentences,BUFSIZE);
//
// fprintf(stderr,"NUMSENTENCES %d\n",numsentences);
read_events(stdin,&new_hash,-1);
// iterate until no more sentences remain.
numsentences = 1;
while (numsentences > 0){
numsentences=read_sentences(words,sentences,BUFSIZE);
fprintf(stderr,"NUMSENTENCES %d\n",numsentences);
for(s=0;s<numsentences;s++)
{
time(&g_time);
pthresh = -5000000;
parse_sentence(&sentences[s]);
/* print_chart();*/
time(&s_time);
printf("TIME %d\n",(int) (s_time-g_time));
}
}
return 1;
}
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support