As with any multiple-byte operand with no alignment requirements, the second operand of TRT (containing the function bytes) can span a cache line or page boundary. So, unless the programmer is exceedingly confident of the content of the first operand of TRT (i.e., the stuff that's being parsed), she or he MUST assume that any byte of the second operand may be accessed. This is not a fault of the instruction ... it's just how it is!
If you're writing typical application code, that's all you need worry about. Sure, the second operand could cross a cache line, requiring a delay to fetch the data from a higher-level cache or from main memory. But, assuming this is relatively frequently executed code, once the data are fetched, that's over ... and the cache line will stay hot if it continues to be frequently referenced. Similarly, for a page-translation exception, if we assume that the program is bug-free, then the result will simply be a page fault, the OS will roll in the page frame, and (again, assuming frequent execution), the page will stay resident. If the code is not frequently executed, any angst over performance is somewhat moot. As to parsing most languages, delimiting characters usually occur within the first 128 bytes for both ASCII and EBCDIC (although EBCDIC alphabetic and numeric codes are in the second 128 bytes). This is also true for UTF-16 characters; that is, the delimiting characters like common punctuation and white space are within the first 128 bytes of the function-code table. TRTE and TRTRE contain an interesting feature that allows you to parse a double-byte first operand (e.g., UTF-16), but only require a 256-byte function-code table; any first-operand character > 256 is assumed to access a function code of zeros. This feature is specifically designed for sifting out common language delimiters in 2-byte character sets. SHARE session 1245 (from SHARE 113 in Denver, 2009) illustrates (among other things) a finite-state parser, comparing TRT versus TRTE. This session also contains suggestions on timing various application code fragments, so you can figure out for yourself how fast or slow a code sequence really is. If you can't find it on the SHARE web site, send me a back-channel note and I'll forward a copy.
