Will kick off regression tests again shortly.

On Mon, May 6, 2019 at 8:37 PM Tim Allison <[email protected]> wrote:
>
> Houston, we have a problem...  The regression parsing took orders of
> magnitude longer than normal.  It is looking like something is going
> seriously wrong (different?) with rfc822 and mbox.
>
> The compression files are taking a good deal longer, too, but those
> might contain rfc822/mbox?!
>
> The following columns are hard to read, but...
>
> mimeA  mimeB totalTimeMillisA totalTimeMillisB  timeB/timeA
>
> "message/rfc822" "message/rfc822" "570881" "777473987" "  1361.885"
> "application/mbox" "application/mbox" "343222" "429843585" "  1252.378"
> "application/x-tar" "application/x-tar" "38245" "687057" "    17.965"
> "application/x-gtar" "application/x-gtar" "218006" "2664526" "    12.222"
> "application/x-lzma" "application/x-lzma" "30094" "242486" "     8.058"
> "application/gzip" "application/gzip" "1775051" "8663969" "     4.881"
> "application/x-xz" "application/x-xz" "279543" "1139716" "     4.077"
> "application/pkcs7-signature" "application/pkcs7-signature" "8323"
> "23089" "     2.774"
> "application/x-archive" "application/x-archive" "780542" "1856648" "     
> 2.379"
> "application/vnd.ms-equation" "application/vnd.ms-equation" "85977"
> "201752" "     2.347"
> "image/vnd.adobe.photoshop" "image/vnd.adobe.photoshop" "11676"
> "25281" "     2.165"
> "text/x-c++src" "text/x-c++src" "13618" "29240" "     2.147"
> "text/html; charset=windows-1256" "text/html; charset=windows-1256"
> "5955" "12333" "     2.071"
> "text/x-vbasic; charset=windows-1252" "text/x-vbasic;
> charset=windows-1252" "150542" "260873" "     1.733"
> "application/xhtml+xml; charset=windows-1256" "application/xhtml+xml;
> charset=windows-1256" "23400" "39906" "     1.705"
> "application/javascript; charset=UTF-8" "application/javascript;
> charset=UTF-8" "14335" "24389" "     1.701"
> "text/csv; charset=ISO-8859-1" "text/csv; charset=ISO-8859-1;
> delimiter=comma" "153222" "256983" "     1.677"
> "application/java-vm" "application/java-vm" "138782" "229235" "     1.652"
> "text/csv; charset=windows-1252" "text/csv; charset=windows-1252;
> delimiter=comma" "136167" "218314" "     1.603"
> "application/vnd.ms-graph" "application/vnd.ms-graph" "37521" "57798"
> "     1.540"
> "application/xml" "application/xml" "5029015" "7678914" "     1.527"
> "text/x-java-source" "text/x-java-source" "38214" "58216" "     1.523"
> "application/java-archive" "application/java-archive" "344629"
> "520414" "     1.510"
> "application/x-bzip2" "application/x-bzip2" "88204" "123552" "     1.401"
> "application/x-tika-msoffice-embedded; format=comp_obj"
> "application/x-tika-msoffice-embedded; format=comp_obj" "22003"
> "30117" "     1.369"
> "text/x-log; charset=windows-1252" "text/x-log; charset=windows-1252"
> "26314" "35492" "     1.349"
> "application/x-sh; charset=ISO-8859-1" "application/x-sh;
> charset=ISO-8859-1" "32229" "42902" "     1.331"
> "application/x-shockwave-flash" "application/x-shockwave-flash"
> "146758" "194142" "     1.323"
> "application/vnd.ms-spreadsheetml" "application/vnd.ms-spreadsheetml"
> "21311" "27945" "     1.311"
> "image/x-portable-bitmap" "image/x-portable-bitmap" "32990" "41899" "     
> 1.270"
> "text/plain; charset=windows-1252" "text/tsv; charset=windows-1252;
> delimiter=tab" "123082" "152588" "     1.240"
> "application/postscript" "application/postscript" "2137277" "2637633"
> "     1.234"
> "text/x-log; charset=ISO-8859-1" "text/x-log; charset=ISO-8859-1"
> "216432" "265464" "     1.227"
> "text/html; charset=US-ASCII" "text/html; charset=US-ASCII" "27717"
> "33919" "     1.224"
> "text/plain; charset=ISO-8859-1" "text/tsv; charset=ISO-8859-1;
> delimiter=tab" "278654" "329044" "     1.181"
> "application/rss+xml" "application/rss+xml" "120481" "141826" "     1.177"
> "application/x-rar-compressed" "application/x-rar-compressed" "18543"
> "21774" "     1.174"
> "application/vnd.google-earth.kml+xml"
> "application/vnd.google-earth.kml+xml" "55379" "64505" "     1.165"
> "application/vnd.ms-excel.sheet.4" "application/vnd.ms-excel.sheet.4"
> "108674" "126573" "     1.165"
> "image/wmf" "image/wmf" "1599014" "1856928" "     1.161"
> "image/emf" "image/emf" "1037173" "1203787" "     1.161"
> "application/x-tika-java-web-archive"
> "application/x-tika-java-web-archive" "30681" "35605" "     1.160"
> "text/html; charset=windows-1252" "text/html; charset=windows-1252"
> "1649080" "1838810" "     1.115"
> "text/html; charset=ISO-8859-1" "text/html; charset=ISO-8859-1"
> "2875476" "3176459" "     1.105"
> "application/vnd.ms-excel.sheet.binary.macroenabled.12"
> "application/vnd.ms-excel.sheet.binary.macroenabled.12" "47608"
> "52091" "     1.094"
> "application/vnd.ms-excel.sheet.macroenabled.12"
> "application/vnd.ms-excel.sheet.macroenabled.12" "537125" "584365" "
>   1.088"
> "application/vnd.openxmlformats-officedocument.spreadsheetml.template"
> "application/vnd.openxmlformats-officedocument.spreadsheetml.template"
> "16709" "18173" "     1.088"
> "application/vnd.ms-powerpoint" "application/vnd.ms-powerpoint"
> "21608104" "23194080" "     1.073"
> "application/epub+zip" "application/epub+zip" "730352" "780078" "     1.068"
> "application/x-tika-msoffice-embedded; format=ole10_native"
> "application/x-tika-msoffice-embedded; format=ole10_native" "105164"
> "111438" "     1.060"
> "application/vnd.wordperfect; version=5.1"
> "application/vnd.wordperfect; version=5.1" "17809" "18839" "
> 1.058"
> "image/x-pict" "image/x-pict" "84620" "89128" "     1.053"
> "text/plain; charset=windows-1252" "text/csv; charset=windows-1252;
> delimiter=comma" "620231" "652918" "     1.053"
> "application/xhtml+xml; charset=ISO-8859-1" "application/xhtml+xml;
> charset=ISO-8859-1" "643964" "668054" "     1.037"
> "application/vnd.wordperfect; version=6.x"
> "application/vnd.wordperfect; version=6.x" "11099" "11487" "
> 1.035"
> "application/vnd.ms-powerpoint.presentation.macroenabled.12"
> "application/vnd.ms-powerpoint.presentation.macroenabled.12" "86441"
> "88890" "     1.028"
> "text/plain; charset=ISO-8859-1" "text/csv; charset=ISO-8859-1;
> delimiter=comma" "837886" "858873" "     1.025"
> "image/bmp" "image/bmp" "93239" "95551" "     1.025"
> "application/rdf+xml" "application/rdf+xml" "27031" "27418" "     1.014"
> "application/xhtml+xml; charset=windows-1252" "application/xhtml+xml;
> charset=windows-1252" "155883" "156842" "     1.006"
> "image/gif" "image/gif" "1261751" "1249618" "      .990"
> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
> "3690888" "3647507" "      .988"
> "application/vnd.openxmlformats-officedocument.presentationml.slideshow"
> "application/vnd.openxmlformats-officedocument.presentationml.slideshow"
> "973579" "956187" "      .982"
> "application/rtf" "application/rtf" "390546" "383240" "      .981"
> "text/html; charset=UTF-8" "text/html; charset=UTF-8" "1800532"
> "1757364" "      .976"
> "multipart/related" "multipart/related" "87826" "85500" "      .974"
> "application/zip" "application/zip" "2048721" "1961164" "      .957"
> "application/pdf" "application/pdf" "88918008" "84629858" "      .952"
> "application/vnd.ms-htmlhelp" "application/vnd.ms-htmlhelp" "10763"
> "10215" "      .949"
> "application/vnd.openxmlformats-officedocument.presentationml.presentation"
> "application/vnd.openxmlformats-officedocument.presentationml.presentation"
> "5208977" "4877451" "      .936"
> "application/vnd.ms-excel" "application/vnd.ms-excel" "9910545"
> "9259787" "      .934"
> "application/fits" "application/fits" "203383" "188616" "      .927"
> "application/vnd.openxmlformats-officedocument.wordprocessingml.template"
> "application/vnd.openxmlformats-officedocument.wordprocessingml.template"
> "99755" "92083" "      .923"
> "application/vnd.ms-word.document.macroenabled.12"
> "application/vnd.ms-word.document.macroenabled.12" "108546" "99368" "
>     .915"
> "application/x-tika-ooxml" "application/x-tika-ooxml" "16285504"
> "14863678" "      .913"
> "text/x-csrc; charset=ISO-8859-1" "text/x-csrc; charset=ISO-8859-1"
> "10061" "8988" "      .893"
> "application/x-tika-ooxml-protected"
> "application/x-tika-ooxml-protected" "76261" "67978" "      .891"
> "application/vnd.visio" "application/vnd.visio" "265591" "231051" "      .870"
> "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
> "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
> "12533893" "10606281" "      .846"
> "application/x-7z-compressed" "application/x-7z-compressed" "24736"
> "20670" "      .836"
> "application/vnd.android.package-archive"
> "application/vnd.android.package-archive" "63069" "52426" "      .831"
> "application/x-dbf; format=FoxBASE_plus" "application/x-dbf;
> format=FoxBASE_plus" "165078" "136581" "      .827"
> "text/plain; charset=windows-1252" "text/plain; charset=windows-1252"
> "2005334" "1617215" "      .806"
> "text/html; charset=EUC-JP" "text/html; charset=EUC-JP" "135216"
> "108702" "      .804"
> "application/x-msdownload" "application/x-msdownload" "22394" "17967"
> "      .802"
> "application/msword" "application/msword" "25225956" "20140865" "      .798"
> "application/xhtml+xml; charset=UTF-8" "application/xhtml+xml;
> charset=UTF-8" "1369233" "1084872" "      .792"
> "image/png" "image/png" "8024951" "6311528" "      .786"
> "text/plain; charset=UTF-8" "text/csv; charset=UTF-8; delimiter=comma"
> "102382" "80396" "      .785"
> "application/xhtml+xml; charset=Shift_JIS" "application/xhtml+xml;
> charset=Shift_JIS" "13145" "10310" "      .784"
> "text/plain; charset=UTF-8" "text/tsv; charset=UTF-8; delimiter=tab"
> "22314" "17184" "      .770"
> "text/x-perl; charset=ISO-8859-1" "text/x-perl; charset=ISO-8859-1"
> "19489" "14817" "      .760"
> "text/plain; charset=ISO-8859-1" "text/plain; charset=ISO-8859-1"
> "4776706" "3578705" "      .749"
> "text/html; charset=windows-1251" "text/html; charset=windows-1251"
> "38319" "28496" "      .744"
> "image/jpeg" "image/jpeg" "27188807" "19936262" "      .733"
> "application/x-bibtex-text-file; charset=UTF-8"
> "application/x-bibtex-text-file; charset=UTF-8" "15012" "10946" "
> .729"
> "text/html; charset=GBK" "text/html; charset=GBK" "14763" "10547" "      .714"
> "application/octet-stream" "application/octet-stream" "637102"
> "450923" "      .708"
> "application/vnd.ms-wordml" "application/vnd.ms-wordml" "10319" "7289"
> "      .706"
> "text/plain; charset=UTF-8" "text/plain; charset=UTF-8" "431635"
> "303633" "      .703"
> "application/x-msaccess" "application/x-msaccess" "511222" "357344" "      
> .699"
> "application/vnd.google-earth.kmz" "application/vnd.google-earth.kmz"
> "178217" "124078" "      .696"
> "application/x-executable" "application/x-executable" "32426" "22437"
> "      .692"
> "application/x-ms-asx" "application/x-ms-asx" "81547" "54927" "      .674"
> "text/calendar; charset=UTF-8" "text/calendar; charset=UTF-8" "366484"
> "246181" "      .672"
> "text/x-vcalendar; charset=ISO-8859-1" "text/x-vcalendar;
> charset=ISO-8859-1" "37525" "25065" "      .668"
> "text/x-matlab; charset=UTF-8" "text/x-matlab; charset=UTF-8" "14861"
> "9868" "      .664"
> "application/x-tex; charset=ISO-8859-1" "application/x-tex;
> charset=ISO-8859-1" "19970" "13224" "      .662"
> "text/x-php; charset=ISO-8859-1" "text/x-php; charset=ISO-8859-1"
> "11407" "7497" "      .657"
> "application/x-grib" "application/x-grib" "16179" "10526" "      .651"
> "text/calendar; charset=ISO-8859-1" "text/calendar;
> charset=ISO-8859-1" "76839" "49991" "      .651"
> "text/plain; charset=UTF-16LE" "text/plain; charset=UTF-16LE" "23474"
> "15229" "      .649"
> "text/calendar; charset=windows-1252" "text/calendar;
> charset=windows-1252" "574421" "371225" "      .646"
> "application/x-bibtex-text-file; charset=windows-1252"
> "application/x-bibtex-text-file; charset=windows-1252" "49670" "32081"
> "      .646"
> "text/x-matlab; charset=windows-1252" "text/x-matlab;
> charset=windows-1252" "21890" "14083" "      .643"
> "application/x-bibtex-text-file; charset=ISO-8859-1"
> "application/x-bibtex-text-file; charset=ISO-8859-1" "82478" "52614" "
>      .638"
> "video/x-msvideo" "video/x-msvideo" "24521" "15634" "      .638"
> "text/x-vcard; charset=windows-1252" "text/x-vcard;
> charset=windows-1252" "43197" "27502" "      .637"
> "text/x-vcard; charset=ISO-8859-1" "text/x-vcard; charset=ISO-8859-1"
> "38629" "24563" "      .636"
> "text/x-matlab; charset=ISO-8859-1" "text/x-matlab;
> charset=ISO-8859-1" "66814" "41767" "      .625"
> "text/x-diff; charset=ISO-8859-1" "text/x-diff; charset=ISO-8859-1"
> "28490" "17717" "      .622"
> "video/mpeg" "video/mpeg" "99047" "60633" "      .612"
> "application/vnd.openxmlformats-officedocument"
> "application/vnd.openxmlformats-officedocument" "14021" "8541" "
> .609"
> "application/x-sas-data" "application/x-sas-data" "20148" "12085" "      .600"
> "application/x-endnote-refer" "application/x-endnote-refer" "37721"
> "22427" "      .595"
> "audio/vnd.wave" "audio/vnd.wave" "2025503" "1188124" "      .587"
> "image/vnd.dwg" "image/vnd.dwg" "54326" "31554" "      .581"
> "application/x-mspublisher" "application/x-mspublisher" "827994"
> "479060" "      .579"
> "video/x-flv" "video/x-flv" "17035" "9604" "      .564"
> "audio/x-flac" "audio/x-flac" "153508" "86400" "      .563"
> "video/quicktime" "video/quicktime" "14326" "7974" "      .557"
> "application/x-netcdf" "application/x-netcdf" "150752" "82587" "      .548"
> "application/x-dvi" "application/x-dvi" "52402" "28348" "      .541"
> "image/tiff" "image/tiff" "549514" "296830" "      .540"
> "application/x-tika-msoffice" "application/x-tika-msoffice" "5057216"
> "2645255" "      .523"
> "application/x-mobipocket-ebook" "application/x-mobipocket-ebook"
> "139851" "72051" "      .515"
> "application/x-shapefile" "application/x-shapefile" "13493" "6946" "      
> .515"
> "application/x-hdf" "application/x-hdf" "66076" "33825" "      .512"
> "image/vnd.dxf; format=ascii" "image/vnd.dxf; format=ascii" "117691"
> "59991" "      .510"
> "audio/x-ms-wma" "audio/x-ms-wma" "21392" "10785" "      .504"
> "image/vnd.djvu" "image/vnd.djvu" "186478" "93119" "      .499"
> "image/jp2" "image/jp2" "538909" "265405" "      .492"
> "application/vnd.rn-realmedia" "application/vnd.rn-realmedia" "11677"
> "5678" "      .486"
> "video/x-ms-asf" "video/x-ms-asf" "17676" "8387" "      .474"
> "audio/x-aiff" "audio/x-aiff" "97338" "45472" "      .467"
> "video/x-m4v" "video/x-m4v" "13351" "6207" "      .465"
> "audio/mpeg" "audio/mpeg" "1702142" "782377" "      .460"
> "image/x-portable-pixmap" "image/x-portable-pixmap" "29323" "13385" "      
> .456"
> "application/xhtml+xml; charset=windows-1251" "application/xhtml+xml;
> charset=windows-1251" "14584" "6591" "      .452"
> "video/3gpp" "video/3gpp" "54044" "23964" "      .443"
> "video/x-ms-wmv" "video/x-ms-wmv" "58120" "25581" "      .440"
> "video/quicktime" "application/mp4" "28043" "11520" "      .411"
> "video/mp4" "video/mp4" "95073" "39042" "      .411"
> "application/vnd.apple.keynote" "application/vnd.apple.keynote"
> "17162" "6532" "      .381"
> "application/mp4" "application/mp4" "160396" "48739" "      .304"
> "application/x-ms-installer" "application/x-ms-installer" "22090"
> "4952" "      .224"
>
> On Fri, May 3, 2019 at 1:56 PM Tim Allison <[email protected]> wrote:
> >
> > All,
> >   I've kicked off the regression tests.  I should have results by
> > Tuesday.  Let me know if there's anything else you'd like to get in
> > before 1.21.  I can rerun the regression tests on Monday if desired.
> >
> >           Cheers,
> >
> >                     Tim
> >
> > On Tue, Apr 23, 2019 at 8:21 PM Konstantin Gribov <[email protected]> wrote:
> > >
> > > Tim,
> > >
> > > I'm +1 since I've pushed TIKA-2555/TIKA-2601. But I'm going to look though
> > > ossindex-maven-plugin:audit results.
> > >
> > > Maybe I'll do some cleanup (like using lambdas instead of anonymous
> > > classes, diamond op etc) but that's not a blocker ,)
> > >
> > > --
> > > Best regards,
> > > Konstantin Gribov.
> > >
> > >
> > > On Tue, Apr 23, 2019 at 9:04 AM Oleg Tikhonov <[email protected]> wrote:
> > >
> > > > +1 to wait if needed.
> > > >
> > > > On Mon, Apr 22, 2019, 23:23 Tim Allison <[email protected]> wrote:
> > > >
> > > > > All,
> > > > >   I just made a bunch of upgrades to our dependencies.  I still want
> > > > > to take a first pass at TIKA-2749...maybe by the end of this week with
> > > > > release process kicking off the following week?  I could start the
> > > > > regression tests now (well, tomorrowish), though, unless anyone has
> > > > > anything they want to get in...I'm happy to wait, though, till next
> > > > > week to start the regression tests.
> > > > >  WDYT?
> > > > >
> > > > >        Cheers,
> > > > >
> > > > >                Tim
> > > > >
> > > > > On Mon, Apr 8, 2019 at 2:25 PM Oleg Tikhonov <[email protected]>
> > > > > wrote:
> > > > > >
> > > > > > Great!
> > > > > > +1.
> > > > > > Thanks,
> > > > > > Oleg
> > > > > >
> > > > > > On Mon, Apr 8, 2019, 21:11 Tim Allison <[email protected]> wrote:
> > > > > >
> > > > > > > All,
> > > > > > >   PDFBox will be out in a few days, and POI should be out soon as
> > > > > > > well.  I _think_ I'd like to get in a first draft of "auto" mode 
> > > > > > > for
> > > > > > > OCR'ing PDFs (TIKA-2749), but other than that, I'd be willing to 
> > > > > > > run
> > > > a
> > > > > > > release of 1.21 in the next few weeks.
> > > > > > >   WDYT?
> > > > > > >
> > > > > > >         Best,
> > > > > > >
> > > > > > >                Tim
> > > > > > >
> > > > >
> > > >

Reply via email to