Re: URGENT: Regex, text documents, & CF5 question

Ben Doom Fri, 16 Sep 2005 07:21:56 -0700

First, your ODBC problem.  You can set up a DSN in Win2k, which is an 
ODBC connection to the DB.  CF 6 and (I presume) 7 can connect to that 
without (anecdotally) much overhead.


Second, your RegEx problem.
I wouldn't use RegEx at all here.  Just use a simple replace() to do 
this.  You aren't doing anything you need regex for here.
But to explain your problem anyway, brackets [] denote a character 
class, which does its best to take the contents literally.  Therefore, 
[\t] is backslashes and the letter t.  Besides which, the \t construct 
isn't supported in CF5 (I don't think) anyway.  :-)

So, what I would do is simply
replace(text, chr(whatever number tab is), "&nbsp;&nbsp;", "all")

--Ben

Pete Ruckelshaus wrote:
> Background: I am building an application that is going to replace a
> client-server document publication app.  The old app was custom
> written and used MS Word as the editor to save and store content for
> the documents that needed to be published.  It was poorly implemented,
> but it worked well enough.
> 
> In rebuilding the app, I'm importing the content from those MS Word
> docs usind a cfx tag that's converting the word doc into straight
> text, removing all formatting and other junk but leaving all text and
> whitespace characters.  For a couple of reasons (first, the original
> database only has basic ODBC drivers and I couldn't get it to work
> with MX, second the huge perfromance hit when using COM with CFMX --
> importing 14k docs took 8+ hours on MX and less than 90 minutes with
> CF5) I am using ColdFusion 5 on a dev server running Win2K Server. 
> Many of these docs have tabs in  them for formatting basic columnar
> data; the tab does not, of course, have any real meaning in the web
> world (and browsers collapse whitespace), so the result is that all of
> the columnar data is out of whack.
> 
> My first thought it to use rereplace to find tab chars and replace
> them with NBSP's.  I am doing the following:
> 
> cfset tmp.content = rereplace(tmp.content, "[\t]+",
> "&nbsp;&nbsp;&nbsp;&nbsp;", "ALL")
> 
> Which I thought would replace tabs, but it's instead replacing all of
> the "t" characters.  What's wrong with my regex code?
> 
> Next, I would actually prefer to somehow parse out these docs and
> instead of using non-breaking spaces, I'd like to wrap that columnar
> data in a table.  Has anyone done this before, and if so, what was
> your approach?  String parsing isn't my strong suit.
> 
> If necessary, I COULD do a basic Word scrape in CF5 but then do a more
> advanced content manipulation using CFMX7.
> 
> ANY help would be appreciated.  I'd appreciate a free solution, but if
> a good pay solution exists, let me know.
> 
> Thanks,
> 
> Pete
> 
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Discover CFTicket - The leading ColdFusion Help Desk and Trouble 
Ticket application

http://www.houseoffusion.com/banners/view.cfm?bannerid=48

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:218495
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Re: URGENT: Regex, text documents, & CF5 question

Reply via email to