Re: [Moses-support] collins-parser only parse 2500 !

Hwidong Na Mon, 18 Oct 2010 05:17:15 -0700

Hi, 

You don't need to know the total number of the lines in advance. I
attached my modified version of "main.c", downsizing the number of
sentences in memory for each iteration to 250. I tested it using
"examples/sec??.tagged", and the both input file is parsed without any
segmentation fault.


Best regards,
-- 
Hwidong Na <le...@postech.ac.kr>
KLE lab, POSTECH, KOREA


2010-10-15 (금), 16:02 +0200, marco turchi:
> Hi 
> thanks, I'm trying to modify the main.c in a way that it reads the
> file twice, the first time to get the number of lines and the second
> to run the parser. It is not the best solution, but if ti works it can
> solve the problem.
> 
> I do not yet take in account the segmentation fault.
> 
> thanks
> Marco
> 
> On Fri, Oct 15, 2010 at 3:49 PM, Raphael Payen <rpa...@alphacrc.com>
> wrote:
>         I also had the same problem, and I am also interested by the
>         modifications to make to avoid the segmentation fault.
>         
>         Since when I tried it was for a simple test and I didnt bother
>         correcting, I made this script, which you might use also. It
>         splits the
>         input into chunks of 2500 lines, It is used like this:
>         <file split-file-wrapper.py 2500 parse-en-collins >outfile
>         (But it makes the processing much slower, modifying the source
>         would be
>         better).
>         
>         --
>         Raphael Payen
>         
>         
>         
>         On Fri, 2010-10-15 at 14:40 +0200, marco turchi wrote:
>         > Hi
>         > I have the same problem with the Collins' parser. Do u know
>         exactly
>         > what I need to change in the source code of the parser? or u
>         have a
>         > modified version?
>         >
>         > Thanks a lot
>         > Marco
>         >
>         > On Thu, Jun 3, 2010 at 5:03 PM, Hwidong Na
>         <le...@postech.ac.kr>
>         > wrote:
>         >         Hi,
>         >
>         >         This is not because of the wrapper script, but the
>         Collins'
>         >         parser. You
>         >         can modify the source to iterate the read_sentences
>         function
>         >         in the file
>         >         "main.c". In addition, you need to modify defined
>         values in
>         >         "grammar.h"
>         >         to avoid segmentation faults of long sentences.
>         >
>         >         --
>         >         Hwidong Na <le...@postech.ac.kr>
>         >         KLE lab, POSTECH, KOREA
>         >
>         >
>         >         2010-05-27 (목), 19:20 +0800, dongxinghua0213:
>         >
>         >         > hello,
>         >         > when  parsing sentences using
>          parse-en-collins.perl,I find
>         >         only 2500
>         >         > parsed sentences are available ,but the number  of
>         sentences
>         >         are more
>         >         > than  one hundred thousand , what can I do to
>         parse all
>         >         sentences ?
>         >         >
>         >         >  thank you !
>         >         >
>         >         >
>         >         >
>         >         >
>         >
>         ______________________________________________________________________
>         >         > 网易为中小企业免费提供企业邮箱（自主域名）
>         >
>         >         > _______________________________________________
>         >         > Moses-support mailing list
>         >         > Moses-support@mit.edu
>         >         >
>         http://mailman.mit.edu/mailman/listinfo/moses-support
>         >
>         >
>         >
>         >
>         >
>         >         _______________________________________________
>         >         Moses-support mailing list
>         >         Moses-support@mit.edu
>         >
>         http://mailman.mit.edu/mailman/listinfo/moses-support
>         >
>         > _______________________________________________
>         > Moses-support mailing list
>         > Moses-support@mit.edu
>         > http://mailman.mit.edu/mailman/listinfo/moses-support
>         
>         
>         
>         _______________________________________________
>         Moses-support mailing list
>         Moses-support@mit.edu
>         http://mailman.mit.edu/mailman/listinfo/moses-support
>         
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

/* This code is the statistical natural language parser described in

   M. Collins. 1999.  Head-Driven
   Statistical Models for Natural Language Parsing. PhD Dissertation,
   University of Pennsylvania.

   Copyright (C) 1999 Michael Collins

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/

#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
#include <assert.h>

#include "lexicon.h"

#include "grammar.h"

#include "mymalloc.h"
#include "mymalloc_char.h"

#include "hash.h"

#include "prob.h"

#include "readevents.h"

#include "sentence.h"
#include "chart.h"

#define BUFSIZE 250
sentence_type sentences[BUFSIZE];

int main(int argc, char *argv[])
{
  int s;
  int numsentences;
  FILE *words;
  char grammar[1000];
  char buffer[1000];
  float temp;
  int npflag;

  time_t g_time;
  time_t s_time;

  if(argc!=8) 
    {
      fprintf(stderr,"ERROR in command line, usage:\n cat countsfile | parser.out sentences-file grammarfile beamsize punctuation-flag distaflag distvflag npflag\n");
      return 0;
    }

  sscanf(argv[1],"%s",buffer);
  words=fopen(buffer,"r");
  assert(words!=NULL);

  sscanf(argv[2],"%s",grammar);

  sscanf(argv[3],"%f",&temp);
  BEAMPROB = log(temp);

  sscanf(argv[4],"%d",&PUNC_FLAG);

  sscanf(argv[5],"%d",&DISTAFLAG);
  sscanf(argv[6],"%d",&DISTVFLAG);
  sscanf(argv[7],"%d",&npflag);
  assert(npflag==0 || npflag==1);
  set_treebankoutputflag(npflag);

  mymalloc_init();
  mymalloc_char_init();

  hash_make_table(8000007,&new_hash);
  effhash_make_table(1000003,&eff_hash);

  read_grammar(grammar);

//  numsentences=read_sentences(words,sentences,BUFSIZE);
//
//  fprintf(stderr,"NUMSENTENCES %d\n",numsentences);

  read_events(stdin,&new_hash,-1);

  // iterate until no more sentences remain.
  numsentences = 1;
  while (numsentences > 0){
      numsentences=read_sentences(words,sentences,BUFSIZE);
      fprintf(stderr,"NUMSENTENCES %d\n",numsentences);
      for(s=0;s<numsentences;s++)
       {
         time(&g_time);

         pthresh = -5000000;

         parse_sentence(&sentences[s]);
         
    /*     print_chart();*/
         time(&s_time);
         printf("TIME %d\n",(int) (s_time-g_time));
       }
  }
  return 1;
}

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] collins-parser only parse 2500 !

Reply via email to