James,
We were discussing the sentence detector thing in person here the other
day and Pei had a thought that depending on what sources you were using
for training the sentence detector, we might be able to do something
equivalent here in Boston by using SHARP, THYME, MIPACQ data which are
largely from Mayo and probably similar to what you use, then augmenting
with the little bit of MIMIC that I annotated. I don't know how that
compares size-wise to the dataset that you are using. Is it quite large
or do you think if we use derived data from those other projects will we
be good? What do you think of this plan? Anyone else?
Tim
- training data for sentence detector Tim Miller
-