Hi Ahmet, I want primarily 3 things.
1. To include # and @ as part of the string which is tokenized by the standard tokenizer which generally strips it off. 2. When a string is tokenized,I just want to keep tokens which are #tags and @mentions. 3. I understand there is PatternTokenizer but I wanted to leverage twitter-text github to because I trust there regex more then my own. Not only the above three, but I also need to control the special characters that are striped from my string while tokenizing. Please let me know of your views. Regards, Sid. On Sun, Sep 27, 2015 at 5:21 PM, Ahmet Arslan <[email protected]> wrote: > Hi Sid, > > Can you provide us more details? > > Usually you can get away without a custom tokenizer, there may be other > tricks to achieve your requirements. > > Ahmet > > > > On Sunday, September 27, 2015 11:29 PM, Siddhartha Singh Sandhu < > [email protected]> wrote: > > > > Hi Everyone, > > I wanted to write a custom tokenizer and wanted a generic direction and > some guidance on how I should go about achieving this goal. > > Your input will be much appreciated. > > Regards, > > Sid. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
