Hi Ahmet,

I want primarily 3 things.

1. To include # and @ as part of the string which is tokenized by the
standard tokenizer which generally strips it off.
2. When a string is tokenized,I just want to keep tokens which are #tags
and @mentions.
3. I understand there is PatternTokenizer but I wanted to leverage
twitter-text github to because I trust there regex more then my own.

Not only the above three, but I also need to control the special characters
that are striped from my string while tokenizing.

Please let me know of your views.

Regards,

Sid.

On Sun, Sep 27, 2015 at 5:21 PM, Ahmet Arslan <[email protected]>
wrote:

> Hi Sid,
>
> Can you provide us more details?
>
> Usually you can get away without a custom tokenizer, there may be other
> tricks to achieve your requirements.
>
> Ahmet
>
>
>
> On Sunday, September 27, 2015 11:29 PM, Siddhartha Singh Sandhu <
> [email protected]> wrote:
>
>
>
> Hi Everyone,
>
> I wanted to write a custom tokenizer and wanted a generic direction and
> some guidance on how I should go about achieving this goal.
>
> Your input will be much appreciated.
>
> Regards,
>
> Sid.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to