Hello Joe,

> Now I'd like to capture and index the count of forward slash characters
'/'

It seems you are trying to do that with the subcollection plugin, i don't
think that is going to work with it.

Instead, i would suggest to write a simple index plugin that does the
counting, and adds the sum of slashes to a field of the NutchDocument
object that is available there.

Check out the index-basic plugin as an example.

Regards,
Markus

Op zo 26 feb 2023 om 00:57 schreef Gilvary, Joseph
<joseph.gilv...@uspto.gov.invalid>:

> Happy Saturday/Sunday,
>
> I parse some values with index-replace to get some strings I want to
> index, like:
>
>       id:dirsubcollection="https?:\/\/(.*?)([^\/]*)$"$1"
>       dirsubcollection="^[a-zA-Z0-9\.-]*\/"
>
>       id:lastsubdir="https?:\/\/(.*?)([^\/]*)$"$1"
>       lastsubdir="\/$"
>       lastsubdir="[a-zA-Z0-9\._-]*\/"
>
> Now I'd like to capture and index the count of forward slash characters
> '/' but I don't see a way to pull that from this plugin. Is there some
> other plugin I should look at? I appreciate any suggestions to solve this.
>
>  Thanks, stay safe, stay healthy,
>
>  Joe
>

Reply via email to