Hi,

makes sense. I used the same approach recently to speed up an
ingestion process with
Any23 at the end of the pipeline.

my 2 cents

On Fri, Mar 23, 2012 at 12:16 PM, Szymon Danielczyk
<[email protected]> wrote:
> Hi
> Paragraph from their website
>
> "Our solution is to run (Java) regular expressions against each
> webpages prior to extraction, which detect the presence of a
> microformat in a HTML page, and then only run the Any23 extractor when
> the regular expression find potentional matches."
>
> Are we using any technics like that to decide that there is anything
> to parse in the document ?
> Maybe we can build in such feature like a method/filter for users that
> want to parse huge number of docs
> to detect that the document is worth parsing
>
> They have the table with regex they used for each format
> Any opinions about this
>
> Szymon
>
> On 23 March 2012 10:38, Davide Palmisano <[email protected]> wrote:
>> Thanks Michele,
>>
>> this is a great news.
>>
>> Should we have a section on the web site listing
>> all the products/initiatives that are using Any23?
>>
>> On Fri, Mar 23, 2012 at 11:01 AM, Michele Mostarda
>> <[email protected]> wrote:
>>> Hi Guys,
>>>
>>>   just a curiosity:
>>>
>>>    Any23 has been recently used to parse the entire corpus  of Semantic
>>> Web Data existing on the Web [0].
>>>
>>> The best.
>>>
>>> Mic
>>>
>>> [0] http://webdatacommons.org/
>>>
>>> --
>>> Michele Mostarda
>>> Senior Software Engineer
>>> skype: michele.mostarda
>>> twitter: micmos
>>> mail: [email protected]
>>> site : http://www.michelemostarda.com
>>
>>
>>
>> --
>> Davide Palmisano
>>
>> http://davidepalmisano.com
>> http://twitter.com/dpalmisano



-- 
Davide Palmisano

http://davidepalmisano.com
http://twitter.com/dpalmisano

Reply via email to