Houston, we have a problem...  The regression parsing took orders of
magnitude longer than normal.  It is looking like something is going
seriously wrong (different?) with rfc822 and mbox.

The compression files are taking a good deal longer, too, but those
might contain rfc822/mbox?!

The following columns are hard to read, but...

mimeA  mimeB totalTimeMillisA totalTimeMillisB  timeB/timeA

"message/rfc822" "message/rfc822" "570881" "777473987" "  1361.885"
"application/mbox" "application/mbox" "343222" "429843585" "  1252.378"
"application/x-tar" "application/x-tar" "38245" "687057" "    17.965"
"application/x-gtar" "application/x-gtar" "218006" "2664526" "    12.222"
"application/x-lzma" "application/x-lzma" "30094" "242486" "     8.058"
"application/gzip" "application/gzip" "1775051" "8663969" "     4.881"
"application/x-xz" "application/x-xz" "279543" "1139716" "     4.077"
"application/pkcs7-signature" "application/pkcs7-signature" "8323"
"23089" "     2.774"
"application/x-archive" "application/x-archive" "780542" "1856648" "     2.379"
"application/vnd.ms-equation" "application/vnd.ms-equation" "85977"
"201752" "     2.347"
"image/vnd.adobe.photoshop" "image/vnd.adobe.photoshop" "11676"
"25281" "     2.165"
"text/x-c++src" "text/x-c++src" "13618" "29240" "     2.147"
"text/html; charset=windows-1256" "text/html; charset=windows-1256"
"5955" "12333" "     2.071"
"text/x-vbasic; charset=windows-1252" "text/x-vbasic;
charset=windows-1252" "150542" "260873" "     1.733"
"application/xhtml+xml; charset=windows-1256" "application/xhtml+xml;
charset=windows-1256" "23400" "39906" "     1.705"
"application/javascript; charset=UTF-8" "application/javascript;
charset=UTF-8" "14335" "24389" "     1.701"
"text/csv; charset=ISO-8859-1" "text/csv; charset=ISO-8859-1;
delimiter=comma" "153222" "256983" "     1.677"
"application/java-vm" "application/java-vm" "138782" "229235" "     1.652"
"text/csv; charset=windows-1252" "text/csv; charset=windows-1252;
delimiter=comma" "136167" "218314" "     1.603"
"application/vnd.ms-graph" "application/vnd.ms-graph" "37521" "57798"
"     1.540"
"application/xml" "application/xml" "5029015" "7678914" "     1.527"
"text/x-java-source" "text/x-java-source" "38214" "58216" "     1.523"
"application/java-archive" "application/java-archive" "344629"
"520414" "     1.510"
"application/x-bzip2" "application/x-bzip2" "88204" "123552" "     1.401"
"application/x-tika-msoffice-embedded; format=comp_obj"
"application/x-tika-msoffice-embedded; format=comp_obj" "22003"
"30117" "     1.369"
"text/x-log; charset=windows-1252" "text/x-log; charset=windows-1252"
"26314" "35492" "     1.349"
"application/x-sh; charset=ISO-8859-1" "application/x-sh;
charset=ISO-8859-1" "32229" "42902" "     1.331"
"application/x-shockwave-flash" "application/x-shockwave-flash"
"146758" "194142" "     1.323"
"application/vnd.ms-spreadsheetml" "application/vnd.ms-spreadsheetml"
"21311" "27945" "     1.311"
"image/x-portable-bitmap" "image/x-portable-bitmap" "32990" "41899" "     1.270"
"text/plain; charset=windows-1252" "text/tsv; charset=windows-1252;
delimiter=tab" "123082" "152588" "     1.240"
"application/postscript" "application/postscript" "2137277" "2637633"
"     1.234"
"text/x-log; charset=ISO-8859-1" "text/x-log; charset=ISO-8859-1"
"216432" "265464" "     1.227"
"text/html; charset=US-ASCII" "text/html; charset=US-ASCII" "27717"
"33919" "     1.224"
"text/plain; charset=ISO-8859-1" "text/tsv; charset=ISO-8859-1;
delimiter=tab" "278654" "329044" "     1.181"
"application/rss+xml" "application/rss+xml" "120481" "141826" "     1.177"
"application/x-rar-compressed" "application/x-rar-compressed" "18543"
"21774" "     1.174"
"application/vnd.google-earth.kml+xml"
"application/vnd.google-earth.kml+xml" "55379" "64505" "     1.165"
"application/vnd.ms-excel.sheet.4" "application/vnd.ms-excel.sheet.4"
"108674" "126573" "     1.165"
"image/wmf" "image/wmf" "1599014" "1856928" "     1.161"
"image/emf" "image/emf" "1037173" "1203787" "     1.161"
"application/x-tika-java-web-archive"
"application/x-tika-java-web-archive" "30681" "35605" "     1.160"
"text/html; charset=windows-1252" "text/html; charset=windows-1252"
"1649080" "1838810" "     1.115"
"text/html; charset=ISO-8859-1" "text/html; charset=ISO-8859-1"
"2875476" "3176459" "     1.105"
"application/vnd.ms-excel.sheet.binary.macroenabled.12"
"application/vnd.ms-excel.sheet.binary.macroenabled.12" "47608"
"52091" "     1.094"
"application/vnd.ms-excel.sheet.macroenabled.12"
"application/vnd.ms-excel.sheet.macroenabled.12" "537125" "584365" "
  1.088"
"application/vnd.openxmlformats-officedocument.spreadsheetml.template"
"application/vnd.openxmlformats-officedocument.spreadsheetml.template"
"16709" "18173" "     1.088"
"application/vnd.ms-powerpoint" "application/vnd.ms-powerpoint"
"21608104" "23194080" "     1.073"
"application/epub+zip" "application/epub+zip" "730352" "780078" "     1.068"
"application/x-tika-msoffice-embedded; format=ole10_native"
"application/x-tika-msoffice-embedded; format=ole10_native" "105164"
"111438" "     1.060"
"application/vnd.wordperfect; version=5.1"
"application/vnd.wordperfect; version=5.1" "17809" "18839" "
1.058"
"image/x-pict" "image/x-pict" "84620" "89128" "     1.053"
"text/plain; charset=windows-1252" "text/csv; charset=windows-1252;
delimiter=comma" "620231" "652918" "     1.053"
"application/xhtml+xml; charset=ISO-8859-1" "application/xhtml+xml;
charset=ISO-8859-1" "643964" "668054" "     1.037"
"application/vnd.wordperfect; version=6.x"
"application/vnd.wordperfect; version=6.x" "11099" "11487" "
1.035"
"application/vnd.ms-powerpoint.presentation.macroenabled.12"
"application/vnd.ms-powerpoint.presentation.macroenabled.12" "86441"
"88890" "     1.028"
"text/plain; charset=ISO-8859-1" "text/csv; charset=ISO-8859-1;
delimiter=comma" "837886" "858873" "     1.025"
"image/bmp" "image/bmp" "93239" "95551" "     1.025"
"application/rdf+xml" "application/rdf+xml" "27031" "27418" "     1.014"
"application/xhtml+xml; charset=windows-1252" "application/xhtml+xml;
charset=windows-1252" "155883" "156842" "     1.006"
"image/gif" "image/gif" "1261751" "1249618" "      .990"
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
"3690888" "3647507" "      .988"
"application/vnd.openxmlformats-officedocument.presentationml.slideshow"
"application/vnd.openxmlformats-officedocument.presentationml.slideshow"
"973579" "956187" "      .982"
"application/rtf" "application/rtf" "390546" "383240" "      .981"
"text/html; charset=UTF-8" "text/html; charset=UTF-8" "1800532"
"1757364" "      .976"
"multipart/related" "multipart/related" "87826" "85500" "      .974"
"application/zip" "application/zip" "2048721" "1961164" "      .957"
"application/pdf" "application/pdf" "88918008" "84629858" "      .952"
"application/vnd.ms-htmlhelp" "application/vnd.ms-htmlhelp" "10763"
"10215" "      .949"
"application/vnd.openxmlformats-officedocument.presentationml.presentation"
"application/vnd.openxmlformats-officedocument.presentationml.presentation"
"5208977" "4877451" "      .936"
"application/vnd.ms-excel" "application/vnd.ms-excel" "9910545"
"9259787" "      .934"
"application/fits" "application/fits" "203383" "188616" "      .927"
"application/vnd.openxmlformats-officedocument.wordprocessingml.template"
"application/vnd.openxmlformats-officedocument.wordprocessingml.template"
"99755" "92083" "      .923"
"application/vnd.ms-word.document.macroenabled.12"
"application/vnd.ms-word.document.macroenabled.12" "108546" "99368" "
    .915"
"application/x-tika-ooxml" "application/x-tika-ooxml" "16285504"
"14863678" "      .913"
"text/x-csrc; charset=ISO-8859-1" "text/x-csrc; charset=ISO-8859-1"
"10061" "8988" "      .893"
"application/x-tika-ooxml-protected"
"application/x-tika-ooxml-protected" "76261" "67978" "      .891"
"application/vnd.visio" "application/vnd.visio" "265591" "231051" "      .870"
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
"12533893" "10606281" "      .846"
"application/x-7z-compressed" "application/x-7z-compressed" "24736"
"20670" "      .836"
"application/vnd.android.package-archive"
"application/vnd.android.package-archive" "63069" "52426" "      .831"
"application/x-dbf; format=FoxBASE_plus" "application/x-dbf;
format=FoxBASE_plus" "165078" "136581" "      .827"
"text/plain; charset=windows-1252" "text/plain; charset=windows-1252"
"2005334" "1617215" "      .806"
"text/html; charset=EUC-JP" "text/html; charset=EUC-JP" "135216"
"108702" "      .804"
"application/x-msdownload" "application/x-msdownload" "22394" "17967"
"      .802"
"application/msword" "application/msword" "25225956" "20140865" "      .798"
"application/xhtml+xml; charset=UTF-8" "application/xhtml+xml;
charset=UTF-8" "1369233" "1084872" "      .792"
"image/png" "image/png" "8024951" "6311528" "      .786"
"text/plain; charset=UTF-8" "text/csv; charset=UTF-8; delimiter=comma"
"102382" "80396" "      .785"
"application/xhtml+xml; charset=Shift_JIS" "application/xhtml+xml;
charset=Shift_JIS" "13145" "10310" "      .784"
"text/plain; charset=UTF-8" "text/tsv; charset=UTF-8; delimiter=tab"
"22314" "17184" "      .770"
"text/x-perl; charset=ISO-8859-1" "text/x-perl; charset=ISO-8859-1"
"19489" "14817" "      .760"
"text/plain; charset=ISO-8859-1" "text/plain; charset=ISO-8859-1"
"4776706" "3578705" "      .749"
"text/html; charset=windows-1251" "text/html; charset=windows-1251"
"38319" "28496" "      .744"
"image/jpeg" "image/jpeg" "27188807" "19936262" "      .733"
"application/x-bibtex-text-file; charset=UTF-8"
"application/x-bibtex-text-file; charset=UTF-8" "15012" "10946" "
.729"
"text/html; charset=GBK" "text/html; charset=GBK" "14763" "10547" "      .714"
"application/octet-stream" "application/octet-stream" "637102"
"450923" "      .708"
"application/vnd.ms-wordml" "application/vnd.ms-wordml" "10319" "7289"
"      .706"
"text/plain; charset=UTF-8" "text/plain; charset=UTF-8" "431635"
"303633" "      .703"
"application/x-msaccess" "application/x-msaccess" "511222" "357344" "      .699"
"application/vnd.google-earth.kmz" "application/vnd.google-earth.kmz"
"178217" "124078" "      .696"
"application/x-executable" "application/x-executable" "32426" "22437"
"      .692"
"application/x-ms-asx" "application/x-ms-asx" "81547" "54927" "      .674"
"text/calendar; charset=UTF-8" "text/calendar; charset=UTF-8" "366484"
"246181" "      .672"
"text/x-vcalendar; charset=ISO-8859-1" "text/x-vcalendar;
charset=ISO-8859-1" "37525" "25065" "      .668"
"text/x-matlab; charset=UTF-8" "text/x-matlab; charset=UTF-8" "14861"
"9868" "      .664"
"application/x-tex; charset=ISO-8859-1" "application/x-tex;
charset=ISO-8859-1" "19970" "13224" "      .662"
"text/x-php; charset=ISO-8859-1" "text/x-php; charset=ISO-8859-1"
"11407" "7497" "      .657"
"application/x-grib" "application/x-grib" "16179" "10526" "      .651"
"text/calendar; charset=ISO-8859-1" "text/calendar;
charset=ISO-8859-1" "76839" "49991" "      .651"
"text/plain; charset=UTF-16LE" "text/plain; charset=UTF-16LE" "23474"
"15229" "      .649"
"text/calendar; charset=windows-1252" "text/calendar;
charset=windows-1252" "574421" "371225" "      .646"
"application/x-bibtex-text-file; charset=windows-1252"
"application/x-bibtex-text-file; charset=windows-1252" "49670" "32081"
"      .646"
"text/x-matlab; charset=windows-1252" "text/x-matlab;
charset=windows-1252" "21890" "14083" "      .643"
"application/x-bibtex-text-file; charset=ISO-8859-1"
"application/x-bibtex-text-file; charset=ISO-8859-1" "82478" "52614" "
     .638"
"video/x-msvideo" "video/x-msvideo" "24521" "15634" "      .638"
"text/x-vcard; charset=windows-1252" "text/x-vcard;
charset=windows-1252" "43197" "27502" "      .637"
"text/x-vcard; charset=ISO-8859-1" "text/x-vcard; charset=ISO-8859-1"
"38629" "24563" "      .636"
"text/x-matlab; charset=ISO-8859-1" "text/x-matlab;
charset=ISO-8859-1" "66814" "41767" "      .625"
"text/x-diff; charset=ISO-8859-1" "text/x-diff; charset=ISO-8859-1"
"28490" "17717" "      .622"
"video/mpeg" "video/mpeg" "99047" "60633" "      .612"
"application/vnd.openxmlformats-officedocument"
"application/vnd.openxmlformats-officedocument" "14021" "8541" "
.609"
"application/x-sas-data" "application/x-sas-data" "20148" "12085" "      .600"
"application/x-endnote-refer" "application/x-endnote-refer" "37721"
"22427" "      .595"
"audio/vnd.wave" "audio/vnd.wave" "2025503" "1188124" "      .587"
"image/vnd.dwg" "image/vnd.dwg" "54326" "31554" "      .581"
"application/x-mspublisher" "application/x-mspublisher" "827994"
"479060" "      .579"
"video/x-flv" "video/x-flv" "17035" "9604" "      .564"
"audio/x-flac" "audio/x-flac" "153508" "86400" "      .563"
"video/quicktime" "video/quicktime" "14326" "7974" "      .557"
"application/x-netcdf" "application/x-netcdf" "150752" "82587" "      .548"
"application/x-dvi" "application/x-dvi" "52402" "28348" "      .541"
"image/tiff" "image/tiff" "549514" "296830" "      .540"
"application/x-tika-msoffice" "application/x-tika-msoffice" "5057216"
"2645255" "      .523"
"application/x-mobipocket-ebook" "application/x-mobipocket-ebook"
"139851" "72051" "      .515"
"application/x-shapefile" "application/x-shapefile" "13493" "6946" "      .515"
"application/x-hdf" "application/x-hdf" "66076" "33825" "      .512"
"image/vnd.dxf; format=ascii" "image/vnd.dxf; format=ascii" "117691"
"59991" "      .510"
"audio/x-ms-wma" "audio/x-ms-wma" "21392" "10785" "      .504"
"image/vnd.djvu" "image/vnd.djvu" "186478" "93119" "      .499"
"image/jp2" "image/jp2" "538909" "265405" "      .492"
"application/vnd.rn-realmedia" "application/vnd.rn-realmedia" "11677"
"5678" "      .486"
"video/x-ms-asf" "video/x-ms-asf" "17676" "8387" "      .474"
"audio/x-aiff" "audio/x-aiff" "97338" "45472" "      .467"
"video/x-m4v" "video/x-m4v" "13351" "6207" "      .465"
"audio/mpeg" "audio/mpeg" "1702142" "782377" "      .460"
"image/x-portable-pixmap" "image/x-portable-pixmap" "29323" "13385" "      .456"
"application/xhtml+xml; charset=windows-1251" "application/xhtml+xml;
charset=windows-1251" "14584" "6591" "      .452"
"video/3gpp" "video/3gpp" "54044" "23964" "      .443"
"video/x-ms-wmv" "video/x-ms-wmv" "58120" "25581" "      .440"
"video/quicktime" "application/mp4" "28043" "11520" "      .411"
"video/mp4" "video/mp4" "95073" "39042" "      .411"
"application/vnd.apple.keynote" "application/vnd.apple.keynote"
"17162" "6532" "      .381"
"application/mp4" "application/mp4" "160396" "48739" "      .304"
"application/x-ms-installer" "application/x-ms-installer" "22090"
"4952" "      .224"

On Fri, May 3, 2019 at 1:56 PM Tim Allison <[email protected]> wrote:
>
> All,
>   I've kicked off the regression tests.  I should have results by
> Tuesday.  Let me know if there's anything else you'd like to get in
> before 1.21.  I can rerun the regression tests on Monday if desired.
>
>           Cheers,
>
>                     Tim
>
> On Tue, Apr 23, 2019 at 8:21 PM Konstantin Gribov <[email protected]> wrote:
> >
> > Tim,
> >
> > I'm +1 since I've pushed TIKA-2555/TIKA-2601. But I'm going to look though
> > ossindex-maven-plugin:audit results.
> >
> > Maybe I'll do some cleanup (like using lambdas instead of anonymous
> > classes, diamond op etc) but that's not a blocker ,)
> >
> > --
> > Best regards,
> > Konstantin Gribov.
> >
> >
> > On Tue, Apr 23, 2019 at 9:04 AM Oleg Tikhonov <[email protected]> wrote:
> >
> > > +1 to wait if needed.
> > >
> > > On Mon, Apr 22, 2019, 23:23 Tim Allison <[email protected]> wrote:
> > >
> > > > All,
> > > >   I just made a bunch of upgrades to our dependencies.  I still want
> > > > to take a first pass at TIKA-2749...maybe by the end of this week with
> > > > release process kicking off the following week?  I could start the
> > > > regression tests now (well, tomorrowish), though, unless anyone has
> > > > anything they want to get in...I'm happy to wait, though, till next
> > > > week to start the regression tests.
> > > >  WDYT?
> > > >
> > > >        Cheers,
> > > >
> > > >                Tim
> > > >
> > > > On Mon, Apr 8, 2019 at 2:25 PM Oleg Tikhonov <[email protected]>
> > > > wrote:
> > > > >
> > > > > Great!
> > > > > +1.
> > > > > Thanks,
> > > > > Oleg
> > > > >
> > > > > On Mon, Apr 8, 2019, 21:11 Tim Allison <[email protected]> wrote:
> > > > >
> > > > > > All,
> > > > > >   PDFBox will be out in a few days, and POI should be out soon as
> > > > > > well.  I _think_ I'd like to get in a first draft of "auto" mode for
> > > > > > OCR'ing PDFs (TIKA-2749), but other than that, I'd be willing to run
> > > a
> > > > > > release of 1.21 in the next few weeks.
> > > > > >   WDYT?
> > > > > >
> > > > > >         Best,
> > > > > >
> > > > > >                Tim
> > > > > >
> > > >
> > >

Reply via email to