Strickland, Douglas M. wrote:

> Using cffile, I have a form that allows users to upload their resume
> (MSWord documents (.doc files)) to our system.
>  
> What I wanted to do next was...as each document is uploaded,
> read/retrieve the text out of it, and insert it into a table (SQL Server
> database). The reason I wanted to do this, was to make it easier to
> search through the resume data. Any suggestions on how to accomplish
> this would be appreciated.
>  
> I attempted to read the file by using cffile, and was able to read it;
> however, there was also a lot of 'garbage' along with the text that was
> in the file.

That's going to be a world of pain.

Word docs are in their own proprietary format, and as you've probably
discovered, its all binary. Then again, it could be possible to read 
through it all using the Word COM objects. I haven't done this myself
though so I can't help you with that.

But even if they were uploading plain text or you can extract all the
text of the Word doc using the COM interface, how would you identify the
meaning of the bits of text in it? That's an even bigger problem. If you
used a standard template from the CVs with the elements identified using
styles, you might have half a chance, but I doubt anybody'd stand for
that, and you're always going to get a fair few that have been screwed
up.

At best, you'd have to get a human to parse them: these things take too
much context to parse for a computer to do it.

Your best bet is to populate the database tables from forms, and provide
a facility to upload the CVs in whatever format it's in.

Sorry,

K.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Logware (www.logware.us): a new and convenient web-based time tracking 
application. Start tracking and documenting hours spent on a project or with a 
client with Logware today. Try it for free with a 15 day trial account.
http://www.houseoffusion.com/banners/view.cfm?bannerid=67

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:193907
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to