Will kick off regression tests again shortly.
On Mon, May 6, 2019 at 8:37 PM Tim Allison <[email protected]> wrote: > > Houston, we have a problem... The regression parsing took orders of > magnitude longer than normal. It is looking like something is going > seriously wrong (different?) with rfc822 and mbox. > > The compression files are taking a good deal longer, too, but those > might contain rfc822/mbox?! > > The following columns are hard to read, but... > > mimeA mimeB totalTimeMillisA totalTimeMillisB timeB/timeA > > "message/rfc822" "message/rfc822" "570881" "777473987" " 1361.885" > "application/mbox" "application/mbox" "343222" "429843585" " 1252.378" > "application/x-tar" "application/x-tar" "38245" "687057" " 17.965" > "application/x-gtar" "application/x-gtar" "218006" "2664526" " 12.222" > "application/x-lzma" "application/x-lzma" "30094" "242486" " 8.058" > "application/gzip" "application/gzip" "1775051" "8663969" " 4.881" > "application/x-xz" "application/x-xz" "279543" "1139716" " 4.077" > "application/pkcs7-signature" "application/pkcs7-signature" "8323" > "23089" " 2.774" > "application/x-archive" "application/x-archive" "780542" "1856648" " > 2.379" > "application/vnd.ms-equation" "application/vnd.ms-equation" "85977" > "201752" " 2.347" > "image/vnd.adobe.photoshop" "image/vnd.adobe.photoshop" "11676" > "25281" " 2.165" > "text/x-c++src" "text/x-c++src" "13618" "29240" " 2.147" > "text/html; charset=windows-1256" "text/html; charset=windows-1256" > "5955" "12333" " 2.071" > "text/x-vbasic; charset=windows-1252" "text/x-vbasic; > charset=windows-1252" "150542" "260873" " 1.733" > "application/xhtml+xml; charset=windows-1256" "application/xhtml+xml; > charset=windows-1256" "23400" "39906" " 1.705" > "application/javascript; charset=UTF-8" "application/javascript; > charset=UTF-8" "14335" "24389" " 1.701" > "text/csv; charset=ISO-8859-1" "text/csv; charset=ISO-8859-1; > delimiter=comma" "153222" "256983" " 1.677" > "application/java-vm" "application/java-vm" "138782" "229235" " 1.652" > "text/csv; charset=windows-1252" "text/csv; charset=windows-1252; > delimiter=comma" "136167" "218314" " 1.603" > "application/vnd.ms-graph" "application/vnd.ms-graph" "37521" "57798" > " 1.540" > "application/xml" "application/xml" "5029015" "7678914" " 1.527" > "text/x-java-source" "text/x-java-source" "38214" "58216" " 1.523" > "application/java-archive" "application/java-archive" "344629" > "520414" " 1.510" > "application/x-bzip2" "application/x-bzip2" "88204" "123552" " 1.401" > "application/x-tika-msoffice-embedded; format=comp_obj" > "application/x-tika-msoffice-embedded; format=comp_obj" "22003" > "30117" " 1.369" > "text/x-log; charset=windows-1252" "text/x-log; charset=windows-1252" > "26314" "35492" " 1.349" > "application/x-sh; charset=ISO-8859-1" "application/x-sh; > charset=ISO-8859-1" "32229" "42902" " 1.331" > "application/x-shockwave-flash" "application/x-shockwave-flash" > "146758" "194142" " 1.323" > "application/vnd.ms-spreadsheetml" "application/vnd.ms-spreadsheetml" > "21311" "27945" " 1.311" > "image/x-portable-bitmap" "image/x-portable-bitmap" "32990" "41899" " > 1.270" > "text/plain; charset=windows-1252" "text/tsv; charset=windows-1252; > delimiter=tab" "123082" "152588" " 1.240" > "application/postscript" "application/postscript" "2137277" "2637633" > " 1.234" > "text/x-log; charset=ISO-8859-1" "text/x-log; charset=ISO-8859-1" > "216432" "265464" " 1.227" > "text/html; charset=US-ASCII" "text/html; charset=US-ASCII" "27717" > "33919" " 1.224" > "text/plain; charset=ISO-8859-1" "text/tsv; charset=ISO-8859-1; > delimiter=tab" "278654" "329044" " 1.181" > "application/rss+xml" "application/rss+xml" "120481" "141826" " 1.177" > "application/x-rar-compressed" "application/x-rar-compressed" "18543" > "21774" " 1.174" > "application/vnd.google-earth.kml+xml" > "application/vnd.google-earth.kml+xml" "55379" "64505" " 1.165" > "application/vnd.ms-excel.sheet.4" "application/vnd.ms-excel.sheet.4" > "108674" "126573" " 1.165" > "image/wmf" "image/wmf" "1599014" "1856928" " 1.161" > "image/emf" "image/emf" "1037173" "1203787" " 1.161" > "application/x-tika-java-web-archive" > "application/x-tika-java-web-archive" "30681" "35605" " 1.160" > "text/html; charset=windows-1252" "text/html; charset=windows-1252" > "1649080" "1838810" " 1.115" > "text/html; charset=ISO-8859-1" "text/html; charset=ISO-8859-1" > "2875476" "3176459" " 1.105" > "application/vnd.ms-excel.sheet.binary.macroenabled.12" > "application/vnd.ms-excel.sheet.binary.macroenabled.12" "47608" > "52091" " 1.094" > "application/vnd.ms-excel.sheet.macroenabled.12" > "application/vnd.ms-excel.sheet.macroenabled.12" "537125" "584365" " > 1.088" > "application/vnd.openxmlformats-officedocument.spreadsheetml.template" > "application/vnd.openxmlformats-officedocument.spreadsheetml.template" > "16709" "18173" " 1.088" > "application/vnd.ms-powerpoint" "application/vnd.ms-powerpoint" > "21608104" "23194080" " 1.073" > "application/epub+zip" "application/epub+zip" "730352" "780078" " 1.068" > "application/x-tika-msoffice-embedded; format=ole10_native" > "application/x-tika-msoffice-embedded; format=ole10_native" "105164" > "111438" " 1.060" > "application/vnd.wordperfect; version=5.1" > "application/vnd.wordperfect; version=5.1" "17809" "18839" " > 1.058" > "image/x-pict" "image/x-pict" "84620" "89128" " 1.053" > "text/plain; charset=windows-1252" "text/csv; charset=windows-1252; > delimiter=comma" "620231" "652918" " 1.053" > "application/xhtml+xml; charset=ISO-8859-1" "application/xhtml+xml; > charset=ISO-8859-1" "643964" "668054" " 1.037" > "application/vnd.wordperfect; version=6.x" > "application/vnd.wordperfect; version=6.x" "11099" "11487" " > 1.035" > "application/vnd.ms-powerpoint.presentation.macroenabled.12" > "application/vnd.ms-powerpoint.presentation.macroenabled.12" "86441" > "88890" " 1.028" > "text/plain; charset=ISO-8859-1" "text/csv; charset=ISO-8859-1; > delimiter=comma" "837886" "858873" " 1.025" > "image/bmp" "image/bmp" "93239" "95551" " 1.025" > "application/rdf+xml" "application/rdf+xml" "27031" "27418" " 1.014" > "application/xhtml+xml; charset=windows-1252" "application/xhtml+xml; > charset=windows-1252" "155883" "156842" " 1.006" > "image/gif" "image/gif" "1261751" "1249618" " .990" > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" > "3690888" "3647507" " .988" > "application/vnd.openxmlformats-officedocument.presentationml.slideshow" > "application/vnd.openxmlformats-officedocument.presentationml.slideshow" > "973579" "956187" " .982" > "application/rtf" "application/rtf" "390546" "383240" " .981" > "text/html; charset=UTF-8" "text/html; charset=UTF-8" "1800532" > "1757364" " .976" > "multipart/related" "multipart/related" "87826" "85500" " .974" > "application/zip" "application/zip" "2048721" "1961164" " .957" > "application/pdf" "application/pdf" "88918008" "84629858" " .952" > "application/vnd.ms-htmlhelp" "application/vnd.ms-htmlhelp" "10763" > "10215" " .949" > "application/vnd.openxmlformats-officedocument.presentationml.presentation" > "application/vnd.openxmlformats-officedocument.presentationml.presentation" > "5208977" "4877451" " .936" > "application/vnd.ms-excel" "application/vnd.ms-excel" "9910545" > "9259787" " .934" > "application/fits" "application/fits" "203383" "188616" " .927" > "application/vnd.openxmlformats-officedocument.wordprocessingml.template" > "application/vnd.openxmlformats-officedocument.wordprocessingml.template" > "99755" "92083" " .923" > "application/vnd.ms-word.document.macroenabled.12" > "application/vnd.ms-word.document.macroenabled.12" "108546" "99368" " > .915" > "application/x-tika-ooxml" "application/x-tika-ooxml" "16285504" > "14863678" " .913" > "text/x-csrc; charset=ISO-8859-1" "text/x-csrc; charset=ISO-8859-1" > "10061" "8988" " .893" > "application/x-tika-ooxml-protected" > "application/x-tika-ooxml-protected" "76261" "67978" " .891" > "application/vnd.visio" "application/vnd.visio" "265591" "231051" " .870" > "application/vnd.openxmlformats-officedocument.wordprocessingml.document" > "application/vnd.openxmlformats-officedocument.wordprocessingml.document" > "12533893" "10606281" " .846" > "application/x-7z-compressed" "application/x-7z-compressed" "24736" > "20670" " .836" > "application/vnd.android.package-archive" > "application/vnd.android.package-archive" "63069" "52426" " .831" > "application/x-dbf; format=FoxBASE_plus" "application/x-dbf; > format=FoxBASE_plus" "165078" "136581" " .827" > "text/plain; charset=windows-1252" "text/plain; charset=windows-1252" > "2005334" "1617215" " .806" > "text/html; charset=EUC-JP" "text/html; charset=EUC-JP" "135216" > "108702" " .804" > "application/x-msdownload" "application/x-msdownload" "22394" "17967" > " .802" > "application/msword" "application/msword" "25225956" "20140865" " .798" > "application/xhtml+xml; charset=UTF-8" "application/xhtml+xml; > charset=UTF-8" "1369233" "1084872" " .792" > "image/png" "image/png" "8024951" "6311528" " .786" > "text/plain; charset=UTF-8" "text/csv; charset=UTF-8; delimiter=comma" > "102382" "80396" " .785" > "application/xhtml+xml; charset=Shift_JIS" "application/xhtml+xml; > charset=Shift_JIS" "13145" "10310" " .784" > "text/plain; charset=UTF-8" "text/tsv; charset=UTF-8; delimiter=tab" > "22314" "17184" " .770" > "text/x-perl; charset=ISO-8859-1" "text/x-perl; charset=ISO-8859-1" > "19489" "14817" " .760" > "text/plain; charset=ISO-8859-1" "text/plain; charset=ISO-8859-1" > "4776706" "3578705" " .749" > "text/html; charset=windows-1251" "text/html; charset=windows-1251" > "38319" "28496" " .744" > "image/jpeg" "image/jpeg" "27188807" "19936262" " .733" > "application/x-bibtex-text-file; charset=UTF-8" > "application/x-bibtex-text-file; charset=UTF-8" "15012" "10946" " > .729" > "text/html; charset=GBK" "text/html; charset=GBK" "14763" "10547" " .714" > "application/octet-stream" "application/octet-stream" "637102" > "450923" " .708" > "application/vnd.ms-wordml" "application/vnd.ms-wordml" "10319" "7289" > " .706" > "text/plain; charset=UTF-8" "text/plain; charset=UTF-8" "431635" > "303633" " .703" > "application/x-msaccess" "application/x-msaccess" "511222" "357344" " > .699" > "application/vnd.google-earth.kmz" "application/vnd.google-earth.kmz" > "178217" "124078" " .696" > "application/x-executable" "application/x-executable" "32426" "22437" > " .692" > "application/x-ms-asx" "application/x-ms-asx" "81547" "54927" " .674" > "text/calendar; charset=UTF-8" "text/calendar; charset=UTF-8" "366484" > "246181" " .672" > "text/x-vcalendar; charset=ISO-8859-1" "text/x-vcalendar; > charset=ISO-8859-1" "37525" "25065" " .668" > "text/x-matlab; charset=UTF-8" "text/x-matlab; charset=UTF-8" "14861" > "9868" " .664" > "application/x-tex; charset=ISO-8859-1" "application/x-tex; > charset=ISO-8859-1" "19970" "13224" " .662" > "text/x-php; charset=ISO-8859-1" "text/x-php; charset=ISO-8859-1" > "11407" "7497" " .657" > "application/x-grib" "application/x-grib" "16179" "10526" " .651" > "text/calendar; charset=ISO-8859-1" "text/calendar; > charset=ISO-8859-1" "76839" "49991" " .651" > "text/plain; charset=UTF-16LE" "text/plain; charset=UTF-16LE" "23474" > "15229" " .649" > "text/calendar; charset=windows-1252" "text/calendar; > charset=windows-1252" "574421" "371225" " .646" > "application/x-bibtex-text-file; charset=windows-1252" > "application/x-bibtex-text-file; charset=windows-1252" "49670" "32081" > " .646" > "text/x-matlab; charset=windows-1252" "text/x-matlab; > charset=windows-1252" "21890" "14083" " .643" > "application/x-bibtex-text-file; charset=ISO-8859-1" > "application/x-bibtex-text-file; charset=ISO-8859-1" "82478" "52614" " > .638" > "video/x-msvideo" "video/x-msvideo" "24521" "15634" " .638" > "text/x-vcard; charset=windows-1252" "text/x-vcard; > charset=windows-1252" "43197" "27502" " .637" > "text/x-vcard; charset=ISO-8859-1" "text/x-vcard; charset=ISO-8859-1" > "38629" "24563" " .636" > "text/x-matlab; charset=ISO-8859-1" "text/x-matlab; > charset=ISO-8859-1" "66814" "41767" " .625" > "text/x-diff; charset=ISO-8859-1" "text/x-diff; charset=ISO-8859-1" > "28490" "17717" " .622" > "video/mpeg" "video/mpeg" "99047" "60633" " .612" > "application/vnd.openxmlformats-officedocument" > "application/vnd.openxmlformats-officedocument" "14021" "8541" " > .609" > "application/x-sas-data" "application/x-sas-data" "20148" "12085" " .600" > "application/x-endnote-refer" "application/x-endnote-refer" "37721" > "22427" " .595" > "audio/vnd.wave" "audio/vnd.wave" "2025503" "1188124" " .587" > "image/vnd.dwg" "image/vnd.dwg" "54326" "31554" " .581" > "application/x-mspublisher" "application/x-mspublisher" "827994" > "479060" " .579" > "video/x-flv" "video/x-flv" "17035" "9604" " .564" > "audio/x-flac" "audio/x-flac" "153508" "86400" " .563" > "video/quicktime" "video/quicktime" "14326" "7974" " .557" > "application/x-netcdf" "application/x-netcdf" "150752" "82587" " .548" > "application/x-dvi" "application/x-dvi" "52402" "28348" " .541" > "image/tiff" "image/tiff" "549514" "296830" " .540" > "application/x-tika-msoffice" "application/x-tika-msoffice" "5057216" > "2645255" " .523" > "application/x-mobipocket-ebook" "application/x-mobipocket-ebook" > "139851" "72051" " .515" > "application/x-shapefile" "application/x-shapefile" "13493" "6946" " > .515" > "application/x-hdf" "application/x-hdf" "66076" "33825" " .512" > "image/vnd.dxf; format=ascii" "image/vnd.dxf; format=ascii" "117691" > "59991" " .510" > "audio/x-ms-wma" "audio/x-ms-wma" "21392" "10785" " .504" > "image/vnd.djvu" "image/vnd.djvu" "186478" "93119" " .499" > "image/jp2" "image/jp2" "538909" "265405" " .492" > "application/vnd.rn-realmedia" "application/vnd.rn-realmedia" "11677" > "5678" " .486" > "video/x-ms-asf" "video/x-ms-asf" "17676" "8387" " .474" > "audio/x-aiff" "audio/x-aiff" "97338" "45472" " .467" > "video/x-m4v" "video/x-m4v" "13351" "6207" " .465" > "audio/mpeg" "audio/mpeg" "1702142" "782377" " .460" > "image/x-portable-pixmap" "image/x-portable-pixmap" "29323" "13385" " > .456" > "application/xhtml+xml; charset=windows-1251" "application/xhtml+xml; > charset=windows-1251" "14584" "6591" " .452" > "video/3gpp" "video/3gpp" "54044" "23964" " .443" > "video/x-ms-wmv" "video/x-ms-wmv" "58120" "25581" " .440" > "video/quicktime" "application/mp4" "28043" "11520" " .411" > "video/mp4" "video/mp4" "95073" "39042" " .411" > "application/vnd.apple.keynote" "application/vnd.apple.keynote" > "17162" "6532" " .381" > "application/mp4" "application/mp4" "160396" "48739" " .304" > "application/x-ms-installer" "application/x-ms-installer" "22090" > "4952" " .224" > > On Fri, May 3, 2019 at 1:56 PM Tim Allison <[email protected]> wrote: > > > > All, > > I've kicked off the regression tests. I should have results by > > Tuesday. Let me know if there's anything else you'd like to get in > > before 1.21. I can rerun the regression tests on Monday if desired. > > > > Cheers, > > > > Tim > > > > On Tue, Apr 23, 2019 at 8:21 PM Konstantin Gribov <[email protected]> wrote: > > > > > > Tim, > > > > > > I'm +1 since I've pushed TIKA-2555/TIKA-2601. But I'm going to look though > > > ossindex-maven-plugin:audit results. > > > > > > Maybe I'll do some cleanup (like using lambdas instead of anonymous > > > classes, diamond op etc) but that's not a blocker ,) > > > > > > -- > > > Best regards, > > > Konstantin Gribov. > > > > > > > > > On Tue, Apr 23, 2019 at 9:04 AM Oleg Tikhonov <[email protected]> wrote: > > > > > > > +1 to wait if needed. > > > > > > > > On Mon, Apr 22, 2019, 23:23 Tim Allison <[email protected]> wrote: > > > > > > > > > All, > > > > > I just made a bunch of upgrades to our dependencies. I still want > > > > > to take a first pass at TIKA-2749...maybe by the end of this week with > > > > > release process kicking off the following week? I could start the > > > > > regression tests now (well, tomorrowish), though, unless anyone has > > > > > anything they want to get in...I'm happy to wait, though, till next > > > > > week to start the regression tests. > > > > > WDYT? > > > > > > > > > > Cheers, > > > > > > > > > > Tim > > > > > > > > > > On Mon, Apr 8, 2019 at 2:25 PM Oleg Tikhonov <[email protected]> > > > > > wrote: > > > > > > > > > > > > Great! > > > > > > +1. > > > > > > Thanks, > > > > > > Oleg > > > > > > > > > > > > On Mon, Apr 8, 2019, 21:11 Tim Allison <[email protected]> wrote: > > > > > > > > > > > > > All, > > > > > > > PDFBox will be out in a few days, and POI should be out soon as > > > > > > > well. I _think_ I'd like to get in a first draft of "auto" mode > > > > > > > for > > > > > > > OCR'ing PDFs (TIKA-2749), but other than that, I'd be willing to > > > > > > > run > > > > a > > > > > > > release of 1.21 in the next few weeks. > > > > > > > WDYT? > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > Tim > > > > > > > > > > > > > > > >
