Ok, so I thought I'd cracked this and I have to some extent but not completely.

The well known hashtag RegEx I see in most Twitter examples is a variant of 
this ##([a-z0-9\-_]+)

The problem with this is that if you run it over a string that contains HTML 
entities, it will recognise those HTML entities as hashtags too...

E.g. #mytrendingtopic is a valid hashtag but this isn't.

#mytrendingtopic *and* #39 are recognised as hashtags...

Now, my latest crack at this is a bit more complex and looks like this:

##(([a-z_\-]+[0-9_\-]*[a-z0-9_\-]+)|([0-9_\-]+[a-z_\-]+[a-z0-9_\-]+))

Which picks out:

1. All hashtags starting with alpha and containing 0 or more numbers with _ an 
- at any position
2. All hashtags starting with numbers and containing 1 or more alpha with _ an 
- at any position

This RegEx works well but it's still not quite right. The problem with this 
though in that if the film 2001 or 2010 we're hashtags e.g. #2001 or #2010 then 
they would get missed by the RegEx. All other hashtags are recognised just fine 
and HTML entities are ignored so for the most part it's better than the 
original RegEx as widely used.

I've been working on a fix for this problem and been looking at using lookahead 
and lookbehind but it seems CF doesn't support all the features I need, i.e. no 
negative lookbehind.

So if anyone can improve on my current RegEx so I can pick out #mytrendingtopic 
but not #39 from the above example, I'd appreciate it very, very much...

Paul




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Want to reach the ColdFusion community with something they want? Let them know 
on the House of Fusion mailing lists
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:325319
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to