unigrams > 3 = 384 MB dictionary... with all ngrams(pruned by llr >1) we might hit some 5-10GB of entries. With some 25 char average for 5 grams it might be safe to say that we might say hit 100 million rows easily ?
Robin
unigrams > 3 = 384 MB dictionary... with all ngrams(pruned by llr >1) we might hit some 5-10GB of entries. With some 25 char average for 5 grams it might be safe to say that we might say hit 100 million rows easily ?
Robin