I'm trying to understand the intuition behind the features method that
Aaron used in one of his demos. I believe this feature will just work for
detecting the character set (i.e., language used).
Can someone help ?
def featurize(s: String): Vector = {
val n = 1000
val result = new
The program computes hashing bi-gram frequency normalized by total number
of bigrams then filter out zero values. hashing is a effective trick of
vectorizing features. Take a look at
http://en.wikipedia.org/wiki/Feature_hashing
Liquan
On Wed, Oct 1, 2014 at 2:18 PM, Soumya Simanta
Yes, the bigram in that demo only has two characters, which could
separate different character sets. -Xiangrui
On Wed, Oct 1, 2014 at 2:54 PM, Liquan Pei liquan...@gmail.com wrote:
The program computes hashing bi-gram frequency normalized by total number of
bigrams then filter out zero values.