Sure. But you've already convinced me we need a new feature. ;-) Karl
On Tue, Jun 21, 2011 at 3:50 AM, Erlend Garåsen <[email protected]> wrote: > > Sure, I can create a ticket. But first I want to discuss this issue with the > two search consultants we have hired. > > I decided to post to the dev list in order to get some feedback on this > issue. > > Erlend > > On 20.06.11 18.00, Karl Wright wrote: >> >> Hi Erlend, >> >> The inclusions and exclusions are based solely on URL, and block the >> connector from fetching the file. Otherwise you would easily wind up >> fetching the entire web. >> >> However, this raises an interesting issue as to whether there's a way >> in the web connector to do what you are trying to do, which is to >> filter based on URL after links have been extracted. The current >> inclusions/exclusions work fine for any URLs without links but do not >> allow for the case you are looking for. >> >> Can you create a ticket? The suggestion would be to introduce >> post-extraction inclusions and exclusions into the connector. >> >> Karl >> >> >> On Mon, Jun 20, 2011 at 10:53 AM, Erlend Garåsen >> <[email protected]> wrote: >>> >>> I just realized that if I exclude html files for a job, links in these >>> files >>> will not be followed. Is this a desirable behaviour? Should links be >>> followed regardless of the exclude filter? >>> >>> I discovered this issue when I was going to crawl only pdfs and realized >>> that the job ended without finding any documents at all. I think I had >>> something like this in my include list: >>> http://foreninger.uio.no/.*\.pdf$ >>> http://folk.uio.no/.*\.pdf$ >>> >>> Erlend >>> >>> -- >>> Erlend Garåsen >>> Center for Information Technology Services >>> University of Oslo >>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway >>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: >>> 31050 >>> > > > -- > Erlend Garåsen > Center for Information Technology Services > University of Oslo > P.O. Box 1086 Blindern, N-0317 OSLO, Norway > Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 >
