All, Reports are here:
http://162.242.228.174/reports/reports_tika_1_21-pre-rc1.zip I don't see any blockers...some areas for improvement here and there, but nothing awful. Please do take a look. If no one finds anything surprising, I'll roll the rc1 on Monday around 1pm(ish) UTC. Cheers, Tim On Tue, May 7, 2019 at 5:04 PM Tim Allison <[email protected]> wrote: > > Will kick off regression tests again shortly. > > On Mon, May 6, 2019 at 8:37 PM Tim Allison <[email protected]> wrote: > > > > Houston, we have a problem... The regression parsing took orders of > > magnitude longer than normal. It is looking like something is going > > seriously wrong (different?) with rfc822 and mbox. > > > > The compression files are taking a good deal longer, too, but those > > might contain rfc822/mbox?! > > > > The following columns are hard to read, but... > > > > mimeA mimeB totalTimeMillisA totalTimeMillisB timeB/timeA > > > > "message/rfc822" "message/rfc822" "570881" "777473987" " 1361.885" > > "application/mbox" "application/mbox" "343222" "429843585" " 1252.378" > > "application/x-tar" "application/x-tar" "38245" "687057" " 17.965" > > "application/x-gtar" "application/x-gtar" "218006" "2664526" " 12.222" > > "application/x-lzma" "application/x-lzma" "30094" "242486" " 8.058" > > "application/gzip" "application/gzip" "1775051" "8663969" " 4.881" > > "application/x-xz" "application/x-xz" "279543" "1139716" " 4.077" > > "application/pkcs7-signature" "application/pkcs7-signature" "8323" > > "23089" " 2.774" > > "application/x-archive" "application/x-archive" "780542" "1856648" " > > 2.379" > > "application/vnd.ms-equation" "application/vnd.ms-equation" "85977" > > "201752" " 2.347" > > "image/vnd.adobe.photoshop" "image/vnd.adobe.photoshop" "11676" > > "25281" " 2.165" > > "text/x-c++src" "text/x-c++src" "13618" "29240" " 2.147" > > "text/html; charset=windows-1256" "text/html; charset=windows-1256" > > "5955" "12333" " 2.071" > > "text/x-vbasic; charset=windows-1252" "text/x-vbasic; > > charset=windows-1252" "150542" "260873" " 1.733" > > "application/xhtml+xml; charset=windows-1256" "application/xhtml+xml; > > charset=windows-1256" "23400" "39906" " 1.705" > > "application/javascript; charset=UTF-8" "application/javascript; > > charset=UTF-8" "14335" "24389" " 1.701" > > "text/csv; charset=ISO-8859-1" "text/csv; charset=ISO-8859-1; > > delimiter=comma" "153222" "256983" " 1.677" > > "application/java-vm" "application/java-vm" "138782" "229235" " 1.652" > > "text/csv; charset=windows-1252" "text/csv; charset=windows-1252; > > delimiter=comma" "136167" "218314" " 1.603" > > "application/vnd.ms-graph" "application/vnd.ms-graph" "37521" "57798" > > " 1.540" > > "application/xml" "application/xml" "5029015" "7678914" " 1.527" > > "text/x-java-source" "text/x-java-source" "38214" "58216" " 1.523" > > "application/java-archive" "application/java-archive" "344629" > > "520414" " 1.510" > > "application/x-bzip2" "application/x-bzip2" "88204" "123552" " 1.401" > > "application/x-tika-msoffice-embedded; format=comp_obj" > > "application/x-tika-msoffice-embedded; format=comp_obj" "22003" > > "30117" " 1.369" > > "text/x-log; charset=windows-1252" "text/x-log; charset=windows-1252" > > "26314" "35492" " 1.349" > > "application/x-sh; charset=ISO-8859-1" "application/x-sh; > > charset=ISO-8859-1" "32229" "42902" " 1.331" > > "application/x-shockwave-flash" "application/x-shockwave-flash" > > "146758" "194142" " 1.323" > > "application/vnd.ms-spreadsheetml" "application/vnd.ms-spreadsheetml" > > "21311" "27945" " 1.311" > > "image/x-portable-bitmap" "image/x-portable-bitmap" "32990" "41899" " > > 1.270" > > "text/plain; charset=windows-1252" "text/tsv; charset=windows-1252; > > delimiter=tab" "123082" "152588" " 1.240" > > "application/postscript" "application/postscript" "2137277" "2637633" > > " 1.234" > > "text/x-log; charset=ISO-8859-1" "text/x-log; charset=ISO-8859-1" > > "216432" "265464" " 1.227" > > "text/html; charset=US-ASCII" "text/html; charset=US-ASCII" "27717" > > "33919" " 1.224" > > "text/plain; charset=ISO-8859-1" "text/tsv; charset=ISO-8859-1; > > delimiter=tab" "278654" "329044" " 1.181" > > "application/rss+xml" "application/rss+xml" "120481" "141826" " 1.177" > > "application/x-rar-compressed" "application/x-rar-compressed" "18543" > > "21774" " 1.174" > > "application/vnd.google-earth.kml+xml" > > "application/vnd.google-earth.kml+xml" "55379" "64505" " 1.165" > > "application/vnd.ms-excel.sheet.4" "application/vnd.ms-excel.sheet.4" > > "108674" "126573" " 1.165" > > "image/wmf" "image/wmf" "1599014" "1856928" " 1.161" > > "image/emf" "image/emf" "1037173" "1203787" " 1.161" > > "application/x-tika-java-web-archive" > > "application/x-tika-java-web-archive" "30681" "35605" " 1.160" > > "text/html; charset=windows-1252" "text/html; charset=windows-1252" > > "1649080" "1838810" " 1.115" > > "text/html; charset=ISO-8859-1" "text/html; charset=ISO-8859-1" > > "2875476" "3176459" " 1.105" > > "application/vnd.ms-excel.sheet.binary.macroenabled.12" > > "application/vnd.ms-excel.sheet.binary.macroenabled.12" "47608" > > "52091" " 1.094" > > "application/vnd.ms-excel.sheet.macroenabled.12" > > "application/vnd.ms-excel.sheet.macroenabled.12" "537125" "584365" " > > 1.088" > > "application/vnd.openxmlformats-officedocument.spreadsheetml.template" > > "application/vnd.openxmlformats-officedocument.spreadsheetml.template" > > "16709" "18173" " 1.088" > > "application/vnd.ms-powerpoint" "application/vnd.ms-powerpoint" > > "21608104" "23194080" " 1.073" > > "application/epub+zip" "application/epub+zip" "730352" "780078" " 1.068" > > "application/x-tika-msoffice-embedded; format=ole10_native" > > "application/x-tika-msoffice-embedded; format=ole10_native" "105164" > > "111438" " 1.060" > > "application/vnd.wordperfect; version=5.1" > > "application/vnd.wordperfect; version=5.1" "17809" "18839" " > > 1.058" > > "image/x-pict" "image/x-pict" "84620" "89128" " 1.053" > > "text/plain; charset=windows-1252" "text/csv; charset=windows-1252; > > delimiter=comma" "620231" "652918" " 1.053" > > "application/xhtml+xml; charset=ISO-8859-1" "application/xhtml+xml; > > charset=ISO-8859-1" "643964" "668054" " 1.037" > > "application/vnd.wordperfect; version=6.x" > > "application/vnd.wordperfect; version=6.x" "11099" "11487" " > > 1.035" > > "application/vnd.ms-powerpoint.presentation.macroenabled.12" > > "application/vnd.ms-powerpoint.presentation.macroenabled.12" "86441" > > "88890" " 1.028" > > "text/plain; charset=ISO-8859-1" "text/csv; charset=ISO-8859-1; > > delimiter=comma" "837886" "858873" " 1.025" > > "image/bmp" "image/bmp" "93239" "95551" " 1.025" > > "application/rdf+xml" "application/rdf+xml" "27031" "27418" " 1.014" > > "application/xhtml+xml; charset=windows-1252" "application/xhtml+xml; > > charset=windows-1252" "155883" "156842" " 1.006" > > "image/gif" "image/gif" "1261751" "1249618" " .990" > > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" > > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" > > "3690888" "3647507" " .988" > > "application/vnd.openxmlformats-officedocument.presentationml.slideshow" > > "application/vnd.openxmlformats-officedocument.presentationml.slideshow" > > "973579" "956187" " .982" > > "application/rtf" "application/rtf" "390546" "383240" " .981" > > "text/html; charset=UTF-8" "text/html; charset=UTF-8" "1800532" > > "1757364" " .976" > > "multipart/related" "multipart/related" "87826" "85500" " .974" > > "application/zip" "application/zip" "2048721" "1961164" " .957" > > "application/pdf" "application/pdf" "88918008" "84629858" " .952" > > "application/vnd.ms-htmlhelp" "application/vnd.ms-htmlhelp" "10763" > > "10215" " .949" > > "application/vnd.openxmlformats-officedocument.presentationml.presentation" > > "application/vnd.openxmlformats-officedocument.presentationml.presentation" > > "5208977" "4877451" " .936" > > "application/vnd.ms-excel" "application/vnd.ms-excel" "9910545" > > "9259787" " .934" > > "application/fits" "application/fits" "203383" "188616" " .927" > > "application/vnd.openxmlformats-officedocument.wordprocessingml.template" > > "application/vnd.openxmlformats-officedocument.wordprocessingml.template" > > "99755" "92083" " .923" > > "application/vnd.ms-word.document.macroenabled.12" > > "application/vnd.ms-word.document.macroenabled.12" "108546" "99368" " > > .915" > > "application/x-tika-ooxml" "application/x-tika-ooxml" "16285504" > > "14863678" " .913" > > "text/x-csrc; charset=ISO-8859-1" "text/x-csrc; charset=ISO-8859-1" > > "10061" "8988" " .893" > > "application/x-tika-ooxml-protected" > > "application/x-tika-ooxml-protected" "76261" "67978" " .891" > > "application/vnd.visio" "application/vnd.visio" "265591" "231051" " > > .870" > > "application/vnd.openxmlformats-officedocument.wordprocessingml.document" > > "application/vnd.openxmlformats-officedocument.wordprocessingml.document" > > "12533893" "10606281" " .846" > > "application/x-7z-compressed" "application/x-7z-compressed" "24736" > > "20670" " .836" > > "application/vnd.android.package-archive" > > "application/vnd.android.package-archive" "63069" "52426" " .831" > > "application/x-dbf; format=FoxBASE_plus" "application/x-dbf; > > format=FoxBASE_plus" "165078" "136581" " .827" > > "text/plain; charset=windows-1252" "text/plain; charset=windows-1252" > > "2005334" "1617215" " .806" > > "text/html; charset=EUC-JP" "text/html; charset=EUC-JP" "135216" > > "108702" " .804" > > "application/x-msdownload" "application/x-msdownload" "22394" "17967" > > " .802" > > "application/msword" "application/msword" "25225956" "20140865" " .798" > > "application/xhtml+xml; charset=UTF-8" "application/xhtml+xml; > > charset=UTF-8" "1369233" "1084872" " .792" > > "image/png" "image/png" "8024951" "6311528" " .786" > > "text/plain; charset=UTF-8" "text/csv; charset=UTF-8; delimiter=comma" > > "102382" "80396" " .785" > > "application/xhtml+xml; charset=Shift_JIS" "application/xhtml+xml; > > charset=Shift_JIS" "13145" "10310" " .784" > > "text/plain; charset=UTF-8" "text/tsv; charset=UTF-8; delimiter=tab" > > "22314" "17184" " .770" > > "text/x-perl; charset=ISO-8859-1" "text/x-perl; charset=ISO-8859-1" > > "19489" "14817" " .760" > > "text/plain; charset=ISO-8859-1" "text/plain; charset=ISO-8859-1" > > "4776706" "3578705" " .749" > > "text/html; charset=windows-1251" "text/html; charset=windows-1251" > > "38319" "28496" " .744" > > "image/jpeg" "image/jpeg" "27188807" "19936262" " .733" > > "application/x-bibtex-text-file; charset=UTF-8" > > "application/x-bibtex-text-file; charset=UTF-8" "15012" "10946" " > > .729" > > "text/html; charset=GBK" "text/html; charset=GBK" "14763" "10547" " > > .714" > > "application/octet-stream" "application/octet-stream" "637102" > > "450923" " .708" > > "application/vnd.ms-wordml" "application/vnd.ms-wordml" "10319" "7289" > > " .706" > > "text/plain; charset=UTF-8" "text/plain; charset=UTF-8" "431635" > > "303633" " .703" > > "application/x-msaccess" "application/x-msaccess" "511222" "357344" " > > .699" > > "application/vnd.google-earth.kmz" "application/vnd.google-earth.kmz" > > "178217" "124078" " .696" > > "application/x-executable" "application/x-executable" "32426" "22437" > > " .692" > > "application/x-ms-asx" "application/x-ms-asx" "81547" "54927" " .674" > > "text/calendar; charset=UTF-8" "text/calendar; charset=UTF-8" "366484" > > "246181" " .672" > > "text/x-vcalendar; charset=ISO-8859-1" "text/x-vcalendar; > > charset=ISO-8859-1" "37525" "25065" " .668" > > "text/x-matlab; charset=UTF-8" "text/x-matlab; charset=UTF-8" "14861" > > "9868" " .664" > > "application/x-tex; charset=ISO-8859-1" "application/x-tex; > > charset=ISO-8859-1" "19970" "13224" " .662" > > "text/x-php; charset=ISO-8859-1" "text/x-php; charset=ISO-8859-1" > > "11407" "7497" " .657" > > "application/x-grib" "application/x-grib" "16179" "10526" " .651" > > "text/calendar; charset=ISO-8859-1" "text/calendar; > > charset=ISO-8859-1" "76839" "49991" " .651" > > "text/plain; charset=UTF-16LE" "text/plain; charset=UTF-16LE" "23474" > > "15229" " .649" > > "text/calendar; charset=windows-1252" "text/calendar; > > charset=windows-1252" "574421" "371225" " .646" > > "application/x-bibtex-text-file; charset=windows-1252" > > "application/x-bibtex-text-file; charset=windows-1252" "49670" "32081" > > " .646" > > "text/x-matlab; charset=windows-1252" "text/x-matlab; > > charset=windows-1252" "21890" "14083" " .643" > > "application/x-bibtex-text-file; charset=ISO-8859-1" > > "application/x-bibtex-text-file; charset=ISO-8859-1" "82478" "52614" " > > .638" > > "video/x-msvideo" "video/x-msvideo" "24521" "15634" " .638" > > "text/x-vcard; charset=windows-1252" "text/x-vcard; > > charset=windows-1252" "43197" "27502" " .637" > > "text/x-vcard; charset=ISO-8859-1" "text/x-vcard; charset=ISO-8859-1" > > "38629" "24563" " .636" > > "text/x-matlab; charset=ISO-8859-1" "text/x-matlab; > > charset=ISO-8859-1" "66814" "41767" " .625" > > "text/x-diff; charset=ISO-8859-1" "text/x-diff; charset=ISO-8859-1" > > "28490" "17717" " .622" > > "video/mpeg" "video/mpeg" "99047" "60633" " .612" > > "application/vnd.openxmlformats-officedocument" > > "application/vnd.openxmlformats-officedocument" "14021" "8541" " > > .609" > > "application/x-sas-data" "application/x-sas-data" "20148" "12085" " > > .600" > > "application/x-endnote-refer" "application/x-endnote-refer" "37721" > > "22427" " .595" > > "audio/vnd.wave" "audio/vnd.wave" "2025503" "1188124" " .587" > > "image/vnd.dwg" "image/vnd.dwg" "54326" "31554" " .581" > > "application/x-mspublisher" "application/x-mspublisher" "827994" > > "479060" " .579" > > "video/x-flv" "video/x-flv" "17035" "9604" " .564" > > "audio/x-flac" "audio/x-flac" "153508" "86400" " .563" > > "video/quicktime" "video/quicktime" "14326" "7974" " .557" > > "application/x-netcdf" "application/x-netcdf" "150752" "82587" " .548" > > "application/x-dvi" "application/x-dvi" "52402" "28348" " .541" > > "image/tiff" "image/tiff" "549514" "296830" " .540" > > "application/x-tika-msoffice" "application/x-tika-msoffice" "5057216" > > "2645255" " .523" > > "application/x-mobipocket-ebook" "application/x-mobipocket-ebook" > > "139851" "72051" " .515" > > "application/x-shapefile" "application/x-shapefile" "13493" "6946" " > > .515" > > "application/x-hdf" "application/x-hdf" "66076" "33825" " .512" > > "image/vnd.dxf; format=ascii" "image/vnd.dxf; format=ascii" "117691" > > "59991" " .510" > > "audio/x-ms-wma" "audio/x-ms-wma" "21392" "10785" " .504" > > "image/vnd.djvu" "image/vnd.djvu" "186478" "93119" " .499" > > "image/jp2" "image/jp2" "538909" "265405" " .492" > > "application/vnd.rn-realmedia" "application/vnd.rn-realmedia" "11677" > > "5678" " .486" > > "video/x-ms-asf" "video/x-ms-asf" "17676" "8387" " .474" > > "audio/x-aiff" "audio/x-aiff" "97338" "45472" " .467" > > "video/x-m4v" "video/x-m4v" "13351" "6207" " .465" > > "audio/mpeg" "audio/mpeg" "1702142" "782377" " .460" > > "image/x-portable-pixmap" "image/x-portable-pixmap" "29323" "13385" " > > .456" > > "application/xhtml+xml; charset=windows-1251" "application/xhtml+xml; > > charset=windows-1251" "14584" "6591" " .452" > > "video/3gpp" "video/3gpp" "54044" "23964" " .443" > > "video/x-ms-wmv" "video/x-ms-wmv" "58120" "25581" " .440" > > "video/quicktime" "application/mp4" "28043" "11520" " .411" > > "video/mp4" "video/mp4" "95073" "39042" " .411" > > "application/vnd.apple.keynote" "application/vnd.apple.keynote" > > "17162" "6532" " .381" > > "application/mp4" "application/mp4" "160396" "48739" " .304" > > "application/x-ms-installer" "application/x-ms-installer" "22090" > > "4952" " .224" > > > > On Fri, May 3, 2019 at 1:56 PM Tim Allison <[email protected]> wrote: > > > > > > All, > > > I've kicked off the regression tests. I should have results by > > > Tuesday. Let me know if there's anything else you'd like to get in > > > before 1.21. I can rerun the regression tests on Monday if desired. > > > > > > Cheers, > > > > > > Tim > > > > > > On Tue, Apr 23, 2019 at 8:21 PM Konstantin Gribov <[email protected]> > > > wrote: > > > > > > > > Tim, > > > > > > > > I'm +1 since I've pushed TIKA-2555/TIKA-2601. But I'm going to look > > > > though > > > > ossindex-maven-plugin:audit results. > > > > > > > > Maybe I'll do some cleanup (like using lambdas instead of anonymous > > > > classes, diamond op etc) but that's not a blocker ,) > > > > > > > > -- > > > > Best regards, > > > > Konstantin Gribov. > > > > > > > > > > > > On Tue, Apr 23, 2019 at 9:04 AM Oleg Tikhonov <[email protected]> wrote: > > > > > > > > > +1 to wait if needed. > > > > > > > > > > On Mon, Apr 22, 2019, 23:23 Tim Allison <[email protected]> wrote: > > > > > > > > > > > All, > > > > > > I just made a bunch of upgrades to our dependencies. I still want > > > > > > to take a first pass at TIKA-2749...maybe by the end of this week > > > > > > with > > > > > > release process kicking off the following week? I could start the > > > > > > regression tests now (well, tomorrowish), though, unless anyone has > > > > > > anything they want to get in...I'm happy to wait, though, till next > > > > > > week to start the regression tests. > > > > > > WDYT? > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Tim > > > > > > > > > > > > On Mon, Apr 8, 2019 at 2:25 PM Oleg Tikhonov > > > > > > <[email protected]> > > > > > > wrote: > > > > > > > > > > > > > > Great! > > > > > > > +1. > > > > > > > Thanks, > > > > > > > Oleg > > > > > > > > > > > > > > On Mon, Apr 8, 2019, 21:11 Tim Allison <[email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > All, > > > > > > > > PDFBox will be out in a few days, and POI should be out soon > > > > > > > > as > > > > > > > > well. I _think_ I'd like to get in a first draft of "auto" > > > > > > > > mode for > > > > > > > > OCR'ing PDFs (TIKA-2749), but other than that, I'd be willing > > > > > > > > to run > > > > > a > > > > > > > > release of 1.21 in the next few weeks. > > > > > > > > WDYT? > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > Tim > > > > > > > > > > > > > > > > > > >
