Re: cfindex is taking forever - and one more question

2015-04-13 Thread Les Mizzell

I've optimized things as much as I could by building a number of 
collections and limiting each to a specific doc type.

Next question!!

I'm trying to return a few sentences from each doc with the search term 
highlighted. So, I use ContextPassages like below.

cfsearch name=searchResults
collection=docDEPO
criteria=#form.sch#
ContextHighlightBegin=b
ContextHighlightEnd=/b
ContextPassages=4
ContextBytes=500

However, very rarely is #searchResults.context# actually giving me 
anything. Out of 30 returned documents, maybe only 3 return content for 
#searchResults.context#. Usually it's empty/null/

Suggestions?

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360462
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


RE: cfindex is taking forever

2015-04-09 Thread Kevin Parker

This is not a CF solution but it may at least help with what it has to trawl
through  - in any case this will help anything else that has to call or
access the document. 

This is for PDF files but you might consider converting Office files to PDF
at your discretion of course - a properly prepared PDF version of an Office
document can be up to a quarter of the file size of the source document -
that's useful and print and view quality is not compromised.

I'm a big fan of PDF but unfortunately it's a file format that suffers a lot
from bad file preparation -  the result is unnecessarily big files amongst
other things. 

Try optimising all the PDFs to see if this reduces the size of some of the
files - I suspect it might.  You'll need to check that the output settings
(e.g. print resolution, image resolution etc.) are suitable for the end
purpose of the document but from my experience the default settings are
usually quite good.

The good news is you can automate this process over the entire file system
with Acrobat Pro's batch feature.

Hope that helps in some way!


++
Kevin Parker

++

-Original Message-
From: Les Mizzell [mailto:lesm...@bellsouth.net] 
Sent: Thursday, 9 April 2015 8:23 AM
To: cf-talk
Subject: cfindex is taking forever


I'm working on building a search interface for a document depo on a site.
The document folder has files going all the way back to 2005, and includes a
number of 10+ meg pdf files,  a few that are over 20 megs, countless Word
and Excel files, Power Point presentations

I don't have access to the CFAdministrator, so:

cfcollection
 action = create
 categories = no
 collection = docDEPO
 engine = verity
 language = English
 path = #req.path#\collections\

cfindex
 collection=docDEPO
 action=refresh
 type=path
 key=#req.path#\documentdepot\
 language=English
 status=info
 extensions=.pdf,.pptx,.docx,.doc,.xls,.xlsx,.ppsx,.txt, ppt


The collection was created successfully as far as I can tell. However,
indexing has been running (or at least the wheel on my browser is still
turning) for almost 3 hours now. I'm going to forget about it and go mow my
grass and see what's happening when I finish.

I'm thinking though ... too much stuff to index? Or is amount of time not
out of line for a very large collection of files?
Also, I've not been able to find a list of legally accepted extensions. 
I might have something listed that's just going to cause it to crap out
anyway.

Thoughts? Try something else? What exactly?



~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360439
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: cfindex is taking forever

2015-04-08 Thread Byron Mann

Not in front of a computer right now, but there is an option in the
CFcollection tag to list or get a collection details (something like that).
Pretty sure that gives you the record or document count and maybe even size
.

I think that is accessible while indexing is happening. You could possibly
write a quick script to see how far along things are.

On Apr 8, 2015 6:51 PM, Les Mizzell lesm...@bellsouth.net wrote:


   That doesn't actually sound unreasonable, but it might be useful to
   come up with a document count more specific than very large.


 Approx 3000 documents - around 3 gb of data
 ... it's still running from what I can tell.

 ~


~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360437
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: cfindex is taking forever

2015-04-08 Thread Dave Watts

 The collection was created successfully as far as I can tell. However,
 indexing has been running (or at least the wheel on my browser is still
 turning) for almost 3 hours now. I'm going to forget about it and go mow
 my grass and see what's happening when I finish.

 I'm thinking though ... too much stuff to index? Or is amount of time
 not out of line for a very large collection of files?

That doesn't actually sound unreasonable, but it might be useful to
come up with a document count more specific than very large.

 Thoughts? Try something else? What exactly?

Have you considered Solr instead of Verity? Not that this would solve
the problem of indexing a lot of files, specifically.

Dave Watts, CTO, Fig Leaf Software
1-202-527-9569
http://www.figleaf.com/
http://training.figleaf.com/

Fig Leaf Software is a Service-Disabled Veteran-Owned Small Business
(SDVOSB) on GSA Schedule, and provides the highest caliber vendor-
authorized instruction at our training centers, online, or onsite.

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360433
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: cfindex is taking forever

2015-04-08 Thread Les Mizzell

  I'm going to forget about it and go mow my grass and see what's 
happening when I finish.

Well crap, somebody stole my lawnmower. This is why we can't have nice 
things

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360434
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: cfindex is taking forever

2015-04-08 Thread Russ Michaels

you also have to take your disk iops into consideration. If you are on a
VPS then this will give you much slower disk performance, especially if its
not SSD, and actions like this can take a lot longer.

On Wed, Apr 8, 2015 at 11:32 PM, Les Mizzell lesm...@bellsouth.net wrote:


   I'm going to forget about it and go mow my grass and see what's
 happening when I finish.

 Well crap, somebody stole my lawnmower. This is why we can't have nice
 things

 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360435
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: cfindex is taking forever

2015-04-08 Thread Les Mizzell

  That doesn't actually sound unreasonable, but it might be useful to
  come up with a document count more specific than very large.


Approx 3000 documents - around 3 gb of data
... it's still running from what I can tell.

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360436
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm