Re: apostrophe and sentence detector

Tim Miller Mon, 26 Aug 2013 09:36:11 -0700

Ah, so we might suspect that some of those 7 lines in the file wereindeed followed by newlines in the original training data. In theabsence of more/better training data which would help us learn this Ithink it would be reasonable to restore the list of sentence-breakingcharacters to not include apostrophe. Seems like it is rare for asentence to end on it, and my preference is to accidentally call 2sentences one sentence, rather than splitting one sentence in themiddle. I think it's probably better for downstream processing.

Just my .02,
Tim

On 08/26/2013 12:29 PM, Masanz, James J. wrote:

The training data is one sentence per line.
That's how you feed data to the sentence detector.


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
Tim Miller
Sent: Monday, August 26, 2013 11:12 AM
To: [email protected]
Subject: Re: apostrophe and sentence detector


On 08/26/2013 12:05 PM, Masanz, James J. wrote:

The recently rebuilt sentence detector (currently in trunk and the 3.1.0 
branch) is sometimes taking the apostrophe as a sentence break where the 
ctakes-3.0.0-incubating model didn't.

The training data used for the recently rebuilt model only contains only 7 
lines that end with an apostrophe (single quote)

Do you mean 7 sentences that end in a single apostrophe or 7 lines? The
sentence detector will currently break on newlines no matter what, so
the important number is how many sentences end mid-line with an
apostrophe, right?
Tim

Re: apostrophe and sentence detector

Reply via email to